CN116385554A

CN116385554A - Coastal sea area water depth mapping method based on double unmanned aerial vehicle video stitching

Info

Publication number: CN116385554A
Application number: CN202310167538.0A
Authority: CN
Inventors: 裴海龙; 范锦昌
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2023-02-27
Filing date: 2023-02-27
Publication date: 2023-07-04

Abstract

The invention discloses a coastal sea area water depth mapping method based on double unmanned aerial vehicle video stitching, which comprises the following steps: completing the camera internal parameter calibration of two unmanned aerial vehicles; cameras carried by the two unmanned aerial vehicles face the sea and are distributed along the coastline, video is shot at intervals, the field of view of the cameras is ensured to have an intersecting part, and the video is shot through an optical camera to record the motion characteristics of sea waves; then splicing videos shot by the two unmanned aerial vehicles by a video splicing technology; acquiring an orthographic image of each frame through the position and posture information of the unmanned aerial vehicle camera; fixing a straight line along the direction perpendicular to a coastline to serve as a research area, and obtaining a time stack diagram of an orthographic correction picture of an image key frame of a shot video at the position of the fixed straight line; finally, obtaining the offshore water depth by using a cCothy water depth estimation method.

Description

Coastal sea area water depth mapping method based on double unmanned aerial vehicle video stitching

Technical Field

The invention relates to the technical field of coastal zone mapping, in particular to a coastal sea area water depth mapping method based on double unmanned aerial vehicle video stitching.

Background

The traditional method of water depth mapping in offshore areas is mainly to complete measurement by physical measuring instruments, such as sonar systems, laser radars, synthetic aperture radars, satellite images and the like, but the cost of mapping is very high due to the fact that the measuring instruments consume a great deal of manpower and financial resources. Therefore, more low-cost and high-efficiency mapping methods are needed to achieve the purpose of measurement. With the development of photogrammetry technology, a camera or a laser radar sensor is fixed on the shore to record the motion characteristics of sea wave, but in this way, measuring tools such as cameras still need to be deployed on site and proper site selection is needed, and the installation and the removal of the camera or the laser radar sensor are limited by the installation place and also require larger cost.

Nowadays, as research on unmanned aerial vehicles with autonomous systems is becoming mature, advantages of using such an onboard system with a camera are increasingly prominent, and a low-cost and high-efficiency mapping method using unmanned aerial vehicles and video processing technology is required to be proposed.

Disclosure of Invention

The invention aims to solve the defects in the prior art and provides a measuring method for estimating the water depth of an offshore area by using an airborne camera image of a double unmanned aerial vehicle.

The aim of the invention can be achieved by adopting the following technical scheme:

an offshore area water depth mapping method based on double unmanned aerial vehicle video stitching, the mapping method comprises the following steps:

s1, finishing the internal parameter calibration of camera cameras of two unmanned aerial vehicles, wherein the two unmanned aerial vehicles are respectively provided with a GPS-RTK measuring module and an IMU inertial measuring unit, and respectively record camera position information and camera posture information;

s2, controlling the two unmanned aerial vehicles to fly along the coastline of the sea area to be measured, hovering or moving the photographed video at the same speed in the same direction at the same interval, and ensuring that the visual field of the photographed video between the two cameras has an intersecting part;

s3, video stitching is carried out on videos shot by the cameras of the two unmanned aerial vehicles to obtain panoramic videos, and then image preprocessing is carried out on the panoramic videos, wherein the image preprocessing comprises image graying and image filtering;

s4, selecting a sea area to be mapped from the panoramic video subjected to image preprocessing, and generating an orthographic corrected image for the sea area to be mapped by utilizing the recorded camera position information and camera posture information;

s5, selecting a straight line in the direction perpendicular to the coastline by using the generated orthographic correction image, and performing image processing on the straight line to obtain time stack images of all frames;

S6, estimating the water depth information of the corresponding pixel coordinate point through a cCoth water depth estimation method.

Further, in the step S1, calibration of the camera internal parameters is completed by a mathematical tool to obtain an internal reference matrix

Wherein f _x And f _y Is a description of pixel density of the camera sensor in the x-axis horizontal direction and the y-axis vertical direction of the image coordinate system, x _o And y _o Representing the pixel offset of the camera optical axis in the image coordinate system. The internal parameters of the camera (internal parameters for short) describe the intrinsic properties of the camera itself, determine the shape and size of the two-dimensional image acquired by the camera from the three-dimensional scene, and are therefore important inputs for image processing and computational geometry transformations.

Further, the step S3 is as follows: defining the video shot by the first unmanned aerial vehicle as A and the video shot by the second unmanned aerial vehicle as B. For the video shot by each unmanned aerial vehicle, dividing each frame of the video into m parts according to the width and the height of the resolution, and dividing each frame of the video into m parts ² The ith grid is denoted by i. Set F _i (t) representing a homography transformation matrix between a t frame and a t+1 frame of an i-th grid of the video, and a path of the video is defined as a running multiplication of the i-th grid from the 1 st homography transformation matrix to the t-th homography transformation matrix, and using C _i And (t) represents:

C _i (t)＝F _i (t)·F _i (t-1)···F _i (1),1≤t≤T-1,3≤T,1≤i≤m ²

where T is the total number of frames of a single video, using C _i The video path represented by (t) reflects the motion trail of the camera through the transformation between video frames, records the jitter generated in the motion process of the camera, and can be effectivelyDescribing the motion state of the video, and correspondingly, the video path of each grid i in the video A is that

The video path of each grid i in video B is +.>

Setting a video path optimization formula Θ (P _i ) The following are provided: />

Wherein Ω _t Representing the range of the adjacent frame r of the t-th frame, P _i (t) represents the smooth video path of the ith grid of video, which is the continuous multiplication of the ith grid of video from the 1 st homography transformation matrix to the t th homography transformation matrix, through C _i (t) iteratively optimized P _i (r) is also the smooth video path of the ith grid of video, is the product of the ith grid of video from the 1 st homography transformation matrix to the r th homography transformation matrix, is through C _i (r) iteratively optimizing to define the smoothed video path for each grid i of video A as P _i ^A (t) defining the smooth video path of each grid i of video B as P _i ^B (t), λ represents the overall weight for balancing P _i (t)-C _i (t) and P _i (t)-P _i (r) ||, w _t,r For maintaining motion discontinuities at rapid displacements or scene changes, calculated by a gaussian function G: w (w) _t,r ＝G(||r-t||)·G(||C _i (r)-C _i (t)||)

Wherein the Gaussian function G is defined as

x0 is the input variable of the Gaussian function, mu is the mean value of the Gaussian function, sigma is the standard deviation of the Gaussian function, and w is calculated by the Gaussian function _t,r Can reflect the relation between the t frame and the r frame, w _t,r The greater the motion relation between the t frame and the r frame is more closely explained，w _t,r The smaller the motion relationship between the t frame and the r frame is not close, if the scene changes too fast, C _i (r) and C _i The difference in (t) becomes large, w _t,r Will become smaller, the video path optimization formula Θ (P _i ) Focusing on the smooth video path to stay consistent with the original video path without focusing on the smooth video path on fast scene cuts, thus introducing w _t,r Is favorable for solving the situations of rapid displacement, rotation and scene switching; the iteration of the video path optimization formula is performed by computing C _i (t) updating P _i (t)：

Where ζ is the iteration index, N (i) represents the set of neighbor meshes of the ith mesh, j is the jth mesh in the set N (i), let

Definition E ^stable (P) is a video path stabilization formula:

wherein p= { P _i (t) } is that all meshes i correspond to the smooth video path P _i (t) set, E ^stable And (P) iteratively optimizing the smooth video path P, so that the smooth video path is close to the original video path, clipping and distortion are reduced, and the smooth video path of each grid is smoother. Smooth video path P for video A and video B ^A And P ^B The video stitching formula is as follows:

E(P ^A ,P ^B ,H)＝E ^stable (P ^A )+E ^stable (P ^B )+β·E ^stitch (P ^A ,P ^B ,H)

wherein beta represents a splice coefficient, E ^stitch The video splicing mode is represented, a feature matching method is adopted,

for the L-th feature point of the t-th frame in video A,/for the t-th feature point in the t-th frame in video A>

For the L-th feature point of the t-th frame in the video B, the feature point is expressed as a matrix +.>

Wherein u is _fp And v _fp Meaning horizontal and vertical coordinates of feature points in the pixel coordinate system, +.>

And->

Respectively represent feature points->

And->

The grid where H represents +.>

And->

Via feature points->

And->

Calculating the homography matrix;

assume thatVideo path P of video A shot by first unmanned aerial vehicle ^A The video path stability optimization is not performed, and the video splicing formula is simplified into:

E(P ^A ,P ^B ,H)＝E ^stable (P ^B )+β·E ^stitch (P ^A ,P ^B ,H)

the simplified video splicing formula completes the smoothing of the video path in the iterative process, and simultaneously enables the video B to be spliced with the video A, because only the smoothed video path P is needed _i ^B (t) iterating such that the overall calculation amount is reduced. Smoothing video path P for video B _i ^B (t) iterating for a predetermined number of times, and then passing through P _i ^B (t)·C _i ^B (t) ^-1 And performing distortion transformation on each grid of each frame in the video B, and fusing the obtained result with the video A frame by frame to obtain the panoramic video. And then carrying out image graying and image filtering denoising treatment on each frame of the panoramic video.

Further, the step S4 is as follows:

let coordinate points of world coordinate system be (x) _w ,y _w ,z _w ) The navigation coordinate system adopted is a north east coordinate system, the unit is meter, and the coordinate (x _b ,y _b ,z _b ) The carrier coordinate system adopted is a front right lower coordinate system, and the coordinate points of the unmanned aerial vehicle camera coordinate system are (x) _c ,y _c ,z _c ) The adopted camera coordinate system is a lower right front coordinate system, the unit is meter, and the process of projecting points in the world space into the camera coordinate is as follows:

wherein the method comprises the steps of

For the external parameters of the camera, the position, direction and observation angle of the camera in the three-dimensional scene are described, and the angle from which the camera observes the scene is determined, so that three-dimensional reconstruction is performedImportant input of tasks such as attitude estimation and the like, R _3×3 The rotation matrix is determined by camera attitude information, and the roll angle of the camera attitude information is roll, pitch angle is pitch and yaw angle is yaw. Then the rotation matrix is calculated as:

for a three-dimensional rotation matrix formed by the gestures of the camera coordinate system relative to the machine body coordinate system, the origin of the camera coordinate system coincides with the origin of the machine body coordinate system, the machine body coordinate system can be set to rotate by an angle theta around the z axis and then rotate by an angle +.>

Finally, the rotation angle psi around the x-axis coincides with the three axes of the camera coordinate system, then

t _3×1 For translation matrix, the coordinates of the camera in the world coordinate system are represented by calculating the origin of the camera position and the world coordinate system, and the coordinate points (x _c ,y _c ,z _c ) The conversion to pixel plane coordinates (u, v, 1) is: />

And K is a camera internal reference matrix obtained in the step S1 and is used for confirming the projection property of the camera.

The transformation formula of the camera imaging model is as follows:

by the above conversion, points of the world coordinate system are projected onto a two-dimensional pixel plane, and then orthographic correction is performed on the image.

Further, the process of generating the orthorectified image for the selected sea area by using the camera position information and the camera pose information in the step S4 is as follows:

the north east coordinates are converted to a local coordinate system along the coast and the direction perpendicular to the coast, and the local direction along the coast is set as the X axis, and the direction crossing the coast is set as the Y axis. Rotating the original North east coordinate along the plane where the XY axes are located;

defining a rectangular area along the coast and the vertical coast, and acquiring three-dimensional space points in a world coordinate system in the rectangular area at equal intervals along the X axis and the Y axis, namely acquiring three-dimensional space points in the world coordinate system along the coastline and the vertical coastline;

And obtaining pixel coordinates corresponding to each point through a camera imaging model, uniformly acquiring a series of points in a three-dimensional space to obtain corresponding pixel coordinates, then carrying out sampling interpolation on the pixel coordinates, and rearranging to form a new image, wherein the image is an orthographic correction picture.

Further, in the step S5, for the pixel pictures subjected to the orthographic correction, the total number of the pixel pictures is m0, the column number is cols, the line number is rows, the value range of cols is 1-cols, the first column pixel value is started until the colth column pixel value is ended, the colth column pixel value from the first picture to the m0 picture in the orthographic picture is sequentially taken out and arranged to form a time stack picture, the width of the time stack picture is a time length, the column number is m0, the height is a coast crossing length, the line number is rows, and the total number of the time stack pictures is cols. The common water depth mapping method is a frequency domain-based method, and the time stack diagram obtained by the processing of the steps is favorable for analyzing physical quantities such as the propagation direction, the propagation speed, the propagation frequency and the like of sea wave waves.

Further, the step S6 is as follows: according to the dispersion relation, the following relation exists among the wave angular frequency omega, the wave number k, the water depth h and the gravity acceleration g in the linear wave:

ω ² ＝gktanh(kh)

The calculation formula for deriving the water depth h is as follows:

in addition, there is an identity between the wave speed c, wave number k and wave angular frequency ω, and wave frequency f: />

ω=2pi f, then the water depth formula can also be written as:

according to the above formula, only two of the four physical quantities of wave speed c, wave number k, wave angular frequency ω and wave frequency f need to be known, and an estimated value of the water depth can be deduced. Then, based on the time stack diagram obtained in the step S5, the estimation of the wave frequency f and the wave number k is completed by using the cbath water depth estimation method, thereby calculating the water depth h. The cBathy water depth estimation method is a two-dimensional frequency domain estimation method and mainly comprises signal processing and fault tolerance processing, and as a large number of nonlinear fitting optimization steps and feedback links exist, a plurality of factor weights are set to take account when parameters are estimated, and fault tolerance verification operation is adopted on the finally obtained parameters, the algorithm has stronger robustness.

Compared with the prior art, the invention has the following advantages and effects:

1. according to the invention, the water depth mapping of the coastal sea area is performed by moving and recording video data through the double unmanned aerial vehicles, the mapped sea area range can be continuously updated by moving the unmanned aerial vehicles along the coastline, and the mapped area is recorded at the same time, so that the mapped area is enlarged. Existing methods are hovering in the air with a single drone or mapping using fixed point cameras. In contrast, the present invention has a large mapping range and high mapping efficiency.

2. Because the remote end is far away from the unmanned aerial vehicle camera and the shooting angle is poor in the coastal sea area crossing of the left and right remote ends when the single unmanned aerial vehicle camera records the video, the problem that the mapping result is influenced by information loss exists. The cameras of two unmanned aerial vehicles are introduced, so that the cameras shoot a coastal sea area which is far away from one end of the previous unmanned aerial vehicle, and the video information acquired as a whole is as complete as possible.

3. The method for optimizing the binding video paths and the joint paths is selected to be used when videos recorded by the unmanned aerial vehicle are spliced, and is not a traditional image splicing algorithm for splicing corresponding frames of the same time stamp of each video, but is combined with the paths of the videos to be optimized, so that jitter and double images generated after image splicing are eliminated.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:

FIG. 1 is a flow chart of a method for mapping water depth in an offshore area based on double unmanned aerial vehicle video stitching disclosed in an embodiment of the invention;

FIG. 2 is a frame effect diagram taken after video splicing in an offshore area water depth mapping method based on double unmanned aerial vehicle video splicing, disclosed in the embodiment of the invention;

FIG. 3 is a chart of a selected sea area range (200 meters by 200 meters) in a method for mapping water depth in a coastal sea area based on double unmanned aerial vehicle video stitching, disclosed in an embodiment of the present invention;

FIG. 4 is an orthographic correction chart in an offshore water depth mapping method based on double unmanned aerial vehicle video stitching disclosed in an embodiment of the invention;

FIG. 5 is a time stack diagram of a method for mapping water depth in an offshore area based on video stitching of double unmanned aerial vehicles, disclosed in an embodiment of the invention;

fig. 6 is a water depth estimation result diagram of a coastal sea area water depth mapping method based on double unmanned aerial vehicle video stitching according to an embodiment of the invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1

As shown in fig. 1, the method for mapping the water depth of the offshore area based on the video stitching of the double unmanned aerial vehicle disclosed in the embodiment comprises the following steps: the method comprises the steps of completing camera internal parameter calibration of two unmanned aerial vehicles, operating the two unmanned aerial vehicles to shoot videos at intervals, performing panoramic video stitching on the videos shot by the two unmanned aerial vehicles, obtaining orthographic correction pictures through camera information of each frame, demarcating a research sea area, obtaining a time stack diagram of the research sea area, and estimating the water depth of a coastal sea area by using a cBathy water depth estimation method. The following describes the implementation process in detail with reference to fig. 1:

Step S1, calculating internal reference matrixes of two unmanned aerial vehicle onboard cameras through mathematical tools such as an OpenCV library and the like

Wherein f _x And f _y Is a description of pixel density of the camera sensor in the x-axis horizontal direction and the y-axis vertical direction of the image coordinate system, x _o And y _o Representing the pixel offset of the camera optical axis in the image coordinate system. The common method is a Zhang Zhengyou calibration method, and for convenience, the internal reference calibration can be finished by using a Matlab self-contained tool box.

Step S2, as the single-frame unmanned aerial vehicle is far from a distant cross-coastal sea area, the sea wave can be displayed densely, and if at the moment, fog exists on the sea or short wind wave influence exists, the signal to noise ratio of the unmanned aerial vehicle camera shooting is too low. Another purpose of controlling the second unmanned aerial vehicle is to enable the second unmanned aerial vehicle to shoot a sea area where the first unmanned aerial vehicle shoots poorly, and to improve the signal-to-noise ratio of shooting the sea area. In the beach area that needs survey and drawing depth of water, two unmanned aerial vehicles are operated, make its distribution satisfy certain requirement, like two unmanned aerial vehicles fly along coastline, two unmanned aerial vehicle's camera face the sea, two unmanned aerial vehicle's camera shooting scope need have public field of vision, two unmanned aerial vehicles keep hovering simultaneously or remove at the uniform velocity along same direction at the in-process of shooting video. The purpose of the public view is to facilitate the operation of subsequent video stitching to expand the mapping view angle, the public view of the video shot by the two unmanned aerial vehicles accounts for at least 30% of the single unmanned aerial vehicle camera view, and the speed of unmanned aerial vehicle movement is not more than 1 meter per second.

S3, performing panoramic video stitching on videos shot by two unmanned aerial vehicles by a method of binding video paths and combining optimized paths, wherein the method comprises the following specific operations:

defining the video shot by the first unmanned aerial vehicle as A and the video shot by the second unmanned aerial vehicle as B. For the video shot by each unmanned aerial vehicle, dividing each frame of the video into m parts according to the width and the height of the resolution, and dividing each frame of the video into m parts ² The ith grid is denoted by i. Set F _i (t) representing a homography transformation matrix between a t frame and a t+1 frame of an i-th grid of the video, and a path of the video is defined as a running multiplication of the i-th grid from the 1 st homography transformation matrix to the t-th homography transformation matrix, and using C _i And (t) represents:

C _i (t)＝F _i (t)·F _i (t-1)···F _i (1),1≤t≤T-1,3≤T,1≤i≤m ²

where T is the total number of frames of a single video. Correspondingly, the video path of each grid i in the video A is that

The video path of each grid i in video B is +.>

Setting a video path optimization formula Θ (P _i ) The following are provided:

wherein Ω _t Represents the range of the adjacent frame r of the t-th frame, which takes + -30 frames, P _i (t) represents the smooth video path of the ith grid of video, which is the continuous multiplication of the ith grid of video from the 1 st homography transformation matrix to the t th homography transformation matrix, through C _i (t) iteratively optimized P _i (r) is also the smooth video path of the ith grid of video, is the product of the ith grid of video from the 1 st homography transformation matrix to the r th homography transformation matrix, is through C _i (r) iteratively optimizing. Defining the smooth video path of each grid i of video a as P _i ^A (t) defining the smooth video path of each grid i of video B as P _i ^B (t), λ represents the overall weight for balancing P _i (t)-C _i (t) and P _i (t)-P _i And (r) and the initial value is set to be 5, if the whole video which is expected to be optimized is not different from the original video, lambda can be reduced, and if the video which is expected to be optimized is reduced in distortion, lambda can be increased. And w is _t,r For maintaining motion discontinuities at rapid displacements or scene changes, calculated by a gaussian function G: w (w) _t,r ＝G(||r-t||)·G(||C _i (r)-C _i (t)||)

Wherein the Gaussian function G is defined as

x0 is a variable of the gaussian function, μ is a mean value of the gaussian function, typically 0, σ is a standard deviation of the gaussian function, and an initial value can be set to 10. The iteration of the video path optimization formula is performed by computing C _i (t) updating P _i (t)：

Definition E ^stable (P) is a video path stabilization formula:

wherein p= { P _i (t) } is that all meshes i correspond to the smooth video path P _i (t) a collection. By iteration P _i (t) jitter of the video will be effectively removed, which is a precondition for achieving video stitching stabilization. Smooth video path P for video A and video B ^A And P ^B The video stitching formula is as follows:

wherein beta represents a splice coefficient, E ^stitch The splicing mode of the video is represented, the method of feature matching is used for splicing, the method is used as the L feature point of the t frame in the video A,

And->

Respectively represent feature points->

And->

The grid where H represents +.>

And->

Via feature points->

And->

Calculating the homography matrix;

wherein, select wherein video that first unmanned aerial vehicle shot does not carry out video path optimization, video that second unmanned aerial vehicle shot needs path optimization, and video concatenation formula reduces to:

E(P ^A ,P ^B ,H)＝E ^stable (P ^B )+β·E ^stitch (P ^A ,P ^B ,H)

the smooth video path P of the video B is obtained through a simplified video splicing formula _i ^B (t) when iterating for a predetermined number of times, passing through P again _i ^B (t)·C _i ^B (t) ^-1 And (3) performing distortion transformation on each grid of each frame in the video B, fusing the obtained result with the frame-by-frame image of the video A to obtain a panoramic video, and performing image graying treatment and Gaussian filtering denoising operation.

Note that the path optimization of the video here is actually a homography transformation of the image to some extent, which would make the camera reference calibrated in step S1 unusable, and thus would not allow the mapping of three-dimensional world coordinates to two-dimensional pixel coordinates, or would involve significant errors in the mapping process. Therefore, the video shot by one unmanned aerial vehicle is not subjected to path optimization, so that the video is not influenced by homography transformation, and the camera internal parameters of the unmanned aerial vehicle can be used for subsequent calculation.

After video stitching is completed by using a joint path optimization method, the whole panoramic video shares an internal reference, and the internal reference is the internal reference corresponding to the video A shot by the first unmanned aerial vehicle which is not subjected to path optimization, and the required internal reference meets the condition when mapping the three-dimensional world coordinate definition research range to the two-dimensional plane. Then, the orthographic correction of the panoramic video is performed according to the camera position information and the camera posture information of the first unmanned aerial vehicle. If the video B shot by the second unmanned aerial vehicle is selected to not perform video path optimization in the video splicing process, the video shot by the first unmanned aerial vehicle performs video path optimization, and correspondingly, the video splicing formula is changed into:

E(P ^A ,P ^B ,H)＝E ^stable (P ^A )+β·E ^stitch (P ^A ,P ^B ,H)

The processing of the subsequent panoramic video will be operated in accordance with the data of the second drone.

Step S4, converting world coordinates into a transformation matrix of an image coordinate system through a projection relation from a three-dimensional space to a two-dimensional plane:

wherein K is the camera internal reference matrix obtained in the step S1, wherein

R is an external parameter of the camera _3×3 The rotation matrix is determined by camera attitude information, and the roll angle of the camera attitude information is roll, pitch angle is pitch and yaw angle is yaw. Then the rotation matrix is calculated as:

Wherein t is _3×1 For translation matrices, the coordinates of the camera in the world coordinate system are represented, which can be obtained by calculating the camera position and the origin of the world coordinate system. Through the above conversion, it is finally possible to project points in the three-dimensional space onto the two-dimensional image plane, and then to perform orthographic correction on the image. From the camera position information and the camera pose information of the current frame, an area where orthographic correction is required can be defined.

Since sea area range information needs to be determined by information in the image space, orthographic correction of the image needs to be completed. For convenience of representation, the north-east coordinates are transformed into a local coordinate system along the coast and cross-coast directions. The local direction along the coast is set as the X-axis direction, and the direction along the cross-coast direction is set as the Y-axis direction. The original north-east coordinate system is rotated by a certain angle along the plane where the XY axes are located, the angle can be calculated through the azimuth angle of the local compass, and the specific angle can be chosen according to the local actual situation.

After the coordinate system is converted, only a rectangular area is defined along the coastline and the coast crossing direction.

After the rectangular region is selected, since a final orthographic correction picture needs to be generated, a simple manner is adopted: three-dimensional spatial points are acquired by equally spacing in the X-axis and Y-axis, i.e., along the coastline and in the cross-coast direction, in the previously defined rectangular region.

By the method, the three-dimensional space point coordinates of the region to be studied can be completely obtained, and the pixel coordinates corresponding to each point can be obtained through the camera imaging model. After a series of points are uniformly acquired in the three-dimensional space, the corresponding pixel coordinates are also obtained, then interpolation sampling is carried out on the pixel coordinates, the interpolation algorithm can select bicubic interpolation or bilinear interpolation, rearrangement is carried out to enable the pixel coordinates to form a new image, the image is an orthographic correction picture, and in addition, the distance sampling interval in the three-dimensional space is generally about 0.5 meter and generally cannot exceed 1 meter, otherwise, much information of the orthographic correction picture can be lost to influence the accuracy of water depth estimation. The sampling interval may be determined based on the focal length, resolution, and height of the camera from the sea surface of the unmanned aerial vehicle camera. Thus, there is distance information from pixel to pixel in the resampled orthographic image.

And S5, after orthographic correction is carried out on the panoramic video, selecting a straight line in the direction perpendicular to the coastline, namely in the direction crossing the coast, wherein the straight line has fixed coordinates in a certain dimension of the three-dimensional space, so that a unique solution can be obtained when the three-dimensional space inverse transformation is expected through the image space, and the position information in the three-dimensional space can be measured in the image space. And then generating a corrected picture of the panoramic video by using the selected area and processing to obtain a time stack image of all frames in the area.

Since all the orthorectified pictures are already obtained, a column of pixel values (representing the cross coast direction of the perpendicular coastline in three-dimensional space) is selected on all the orthorectified pictures to form a time stack image for the convenience of conversion into a data structure required for time series correlation analysis. The specific operation is as follows: for the pixel pictures subjected to orthographic conversion, the total number of the pixel pictures is m0, the column number is cols, the line number is rows, the value range of cols is 1-cols, the pixel values of the first column are started to the pixel values of the cols column, and the pixel values of the cols column are sequentially taken out and arranged from the first picture to the m0 picture in the orthographic picture to form a time stack picture. The time stack image has a width that is the length of time and a height that is proportional to the length across the shore.

Step S6, according to the obtained time stack diagram, the water depth can be estimated by using a cBathy water depth estimation method:

according to the dispersion relation, the following relation exists among the wave angular frequency omega, the wave number k, the water depth h and the gravity acceleration g in the linear wave: omega ² ＝gktanh(kh)

The calculation formula for deriving the water depth h is as follows:

in addition, there is an identity between the wave speed c, wave number k and wave angular frequency ω, and wave frequency f:

ω＝2πf

then the water depth formula is further written as:

based on the time stack diagram obtained in the last step, the water depth h is estimated by using a cBathy water depth estimation method to finish calculation of the wave frequency f and the wave number k, and a schematic diagram of a water depth estimation result is shown in fig. 6. Wherein cBathy water depth estimation was published in 2013 from journal Journal of geophysical research: oceans, volume 118, pages 2595-2609, article title: cbath A robust algorithm for estimating nearshore bathymetry.

Example 2

The embodiment further discloses a coastal sea area water depth mapping method based on double unmanned aerial vehicle video stitching, which comprises the following steps:

s1, preparing two unmanned aerial vehicles, wherein the two unmanned aerial vehicles need to carry a high-precision GPS-RTK device and an IMU inertial measurement unit in order to acquire position information and attitude information of a camera respectively. Calculation of unmanned aerial vehicle airborne through Matlab mathematical tool or OpenCV library Reference matrix of camera

Wherein f _x And f _y Is a description of pixel density of the camera sensor in the x-axis horizontal direction and the y-axis vertical direction of the image coordinate system, x _o And y _o Representing the pixel offset of the camera optical axis in the image coordinate system.

S2, in a beach area where water depth needs to be mapped, unmanned aerial vehicles are operated, the flying height is 50-150 meters, so that the distribution of the unmanned aerial vehicles meets certain requirements, for example, two unmanned aerial vehicles are distributed along a coastline, cameras of the two unmanned aerial vehicles face the sea, a common field of view exists in a camera shooting range of the two unmanned aerial vehicles, the two unmanned aerial vehicles simultaneously keep hovering or move at constant speed along the same direction, and a shooting object of the video is periodic wave propagating towards the coast or beach.

S3, video data shot by a plurality of unmanned aerial vehicles are subjected to video stitching through a method of binding video paths and combining optimized paths:

defining the video shot by a first unmanned aerial vehicle as A, the video shot by a second unmanned aerial vehicle as B, dividing each frame of the video into m parts according to the width and the height of the resolution of each frame of the video for the video shot by each unmanned aerial vehicle, and dividing each frame of the video into m parts ² The ith grid is denoted by i. In the calculation process, if the video resolution is too large, m can be set to 10, and if the video resolution is small, m is set to 4. Set F _i (t) representing a homography transformation matrix between a t frame and a t+1 frame of an i-th grid of the video, and a path of the video is defined as a running multiplication of the i-th grid from the 1 st homography transformation matrix to the t-th homography transformation matrix, and using C _i And (t) represents:

C _i (t)＝F _i (t)·F _i (t-1)···F _i (1),1≤t≤T-1,3≤T,1≤i≤m ²

where T is the total number of frames of a single video, and the video path of each grid i in video A is

The video path of each grid i in video B is +.>

Setting a video path optimization formula Θ (P _i ) It is defined as: />

Wherein Ω _t Representing the range of the adjacent frame r of the t-th frame, P _i (t) represents the smooth video path of the ith grid of video, which is the continuous multiplication of the ith grid of video from the 1 st homography transformation matrix to the t th homography transformation matrix, through C _i (t) iteratively optimized P _i The initial value of (t) may be equal to C _i (t)，P _i (r) is also the smooth video path of the ith grid of video, is the product of the ith grid of video from the 1 st homography transformation matrix to the r th homography transformation matrix, is through C _i (r) iteratively optimized, correspondingly, defining the smoothed video path of each grid i of video A as P _i ^A (t) defining the smooth video path of each grid i of video B as P _i ^B (t), λ represents the overall weight, and the initial value may be set to 5 for balancing P _i (t)-C _i (t) and P _i (t)-P _i (r) || and w _t,r For maintaining motion discontinuities at rapid displacements or scene changes, calculated by a gaussian function G:

w _t,r ＝G(||r-t||)·G(||C _i (r)-C _i (t)||)

wherein the Gaussian function G is defined as

x0 is the variable of the Gaussian function, μ is the mean of the Gaussian function, which may be 0, σ is the standard deviation of the Gaussian function, and the iteration of the video path optimization formula is performed by computing C _i (t) updating P _i (t)：

Definition E ^stable (P) is a video path stabilization formula:

wherein p= { P _i (t) } is that all meshes i correspond to the smooth video path P _i (t) for a set of smooth video paths P for video A and video B ^A And P ^B The video stitching formula is as follows:

wherein β represents a splice coefficient for balancing E ^stable And E is ^stitch If the spliced video is expected to be stable and jitter is removed, beta can be set to be smaller to let E ^stable If the video splicing effect is good and the double image is less, beta can be set to be larger to let E ^stitch The weight ratio of (2) is a little larger, if the comprehensive effect is desired, beta=0.01, E is preferable ^stitch Representing the splicing mode of the video, splicing by using a characteristic matching method,

Wherein u is _fp And v _fp Meaning the horizontal and vertical coordinates of the feature point on the pixel coordinate system. />

And->

Respectively represent characteristic points

And->

A grid in which the user is located. H represents grid->

And->

Via feature points->

And->

Calculating the homography matrix; />

Suppose a video path P for video a photographed by a first unmanned aerial vehicle ^A The video path stability optimization is not performed, and the video splicing formula is simplified into:

E(P ^A ,P ^B ,H)＝E ^stable (P ^B )+β·E ^stitch (P ^A ,P ^B ,H)

through simplified video spellingThe formula is connected with the smooth video path P of the video B _i ^B (t) when iterating for a predetermined number of times, passing through P again _i ^B (t)·C _i ^B (t) ^-1 And performing distortion transformation on each grid in the video B, and fusing the obtained result with the frame-by-frame image of the video A to obtain the panoramic video. Then preprocessing the panoramic video, mainly comprising: the specific operation of image graying and Gaussian filtering denoising can be realized by using Matlab image processing functions, calling an OpenCV open source visual library or handwriting source codes.

The video path optimization is essentially a distortion transformation or homography transformation, which changes the distortion degree and transmission view angle of the video image, and makes the camera reference matrix K needed for calculating the orthographic correction area unavailable, and selects the video path P of the video A shot by the first unmanned plane ^A The video path stability optimization is not performed in order to enable subsequent orthographic corrections to use flight data to the first drone, such as position information of the camera and pose information of the camera.

In addition, if the traditional image stitching method is simply used, and the two videos are stitched frame by using the characteristic point matching method, the stitching effect can generate extremely high jitter and double images, and the orthorectified image cannot be generated by using the video stitching result, so that the introduced method for optimizing the binding video path and the joint path can well solve the problem.

S4, converting world coordinates into a transformation matrix of an image coordinate system through a projection relation from a three-dimensional space to a two-dimensional plane:

wherein the method comprises the steps of

R is an external parameter of the camera _3×3 The rotation matrix is determined by camera attitude information, and the roll angle of the camera attitude information is roll, pitch angle is pitch and yaw angle is yaw. Then spin aroundThe calculation formula of the transformation matrix is as follows:

for a three-dimensional rotation matrix formed by the gestures of the camera coordinate system relative to the machine body coordinate system, the origin of the camera coordinate system coincides with the origin of the machine body coordinate system, the machine body coordinate system can be set to rotate by an angle theta around the z axis and then rotate by an angle +. >

t _3×1 For translation matrices, the coordinates of the camera in the world coordinate system are represented, which can be obtained by calculating the camera position and the origin of the world coordinate system. Through a series of transformation matrix operations as above, points in the three-dimensional world coordinate system can be projected onto the two-dimensional plane pixel coordinate system, a water depth estimation area to be researched is selected according to the camera position and posture information of the current frame, and an orthorectified image is generated in the area through the operations: />

Since sea area range information needs to be determined by information in the image space, orthographic correction of the image needs to be completed. As shown in the panoramic image of fig. 3, a length of 200 meters is defined along the coastline direction, a length of 200 meters is defined along the cross-coast direction, and an area of 40000 square meters, the north east ground coordinates are converted to a local coordinate system along the coast and cross-coast directions for convenience of presentation. The local direction along the coast is set as an X axis, the direction across the coast is set as a Y axis, the original north-east coordinate system is rotated by a certain angle along the plane of the XY axis, the angle can be calculated through the azimuth angle of the local compass, and the specific angle can be chosen and calculated according to the local actual situation.

By the method, the three-dimensional space point coordinates of the region to be studied can be completely obtained, and the pixel coordinates corresponding to each point can be obtained through the camera imaging model. After a series of points are uniformly acquired in the three-dimensional space, the corresponding pixel coordinates are also obtained, then the pixel coordinates are interpolated and sampled, and rearranged to reconstruct a new image, namely an orthographic correction picture, and in addition, the distance between the distance samples in the three-dimensional space is generally about 0.5 m, which is generally determined according to the resolution of the onboard camera of the unmanned aerial vehicle and the height of the aircraft. Thus, there is distance information from pixel to pixel in the resampled orthographic image.

S5, after orthographic correction is carried out on the panoramic video, a straight line is selected in the direction perpendicular to the coastline (namely, in the direction crossing the coast), the straight line has fixed coordinates in a certain dimension of the three-dimensional space, and therefore, a unique solution can be obtained when the three-dimensional space inverse transformation is expected through the image space, and position information in the three-dimensional space can be measured in the image space. And then generating a corrected picture of the panoramic video by using the selected area and processing to obtain a time stack image of all frames in the area.

Since all the orthorectified pictures are obtained, in order to be convenient to convert into a data structure required by time series correlation analysis, for the orthorectified pixel pictures, the total number of the pixel pictures is m0, the column number is cols, the line number is rows, the value range of cols is 1-cols, the first column pixel value is started until the colth column pixel value is finished, and the column pixel values from the first picture to the column pixel value of the m0 picture in the orthorectified picture are sequentially taken out and arranged, namely, a column of pixel values (expressed as a cross coast direction of a perpendicular coastline in a three-dimensional space) is selected on all the orthorectified pictures to form a time stack image. The time stack image has a width in seconds and a height proportional to the length across the coast.

S6, the cBathy water depth estimation method is a two-dimensional frequency domain estimation method and mainly comprises signal processing and fault tolerance processing, and as a large number of nonlinear fitting optimization steps and feedback links exist, a plurality of factor weights are set for consideration when parameters are estimated, and fault tolerance verification operation is adopted on the finally obtained parameters, the algorithm is high in robustness. From the time stack obtained in the previous step, the water depth h can be estimated using the cbath water depth estimation method. The cBathy water depth estimation method was published in 2013 from journal Journal of geophysical research: oceans, volume 118, pages 2595-2609, article title: cbath A robust algorithm for estimating nearshore bathymetry.

The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims

1. An offshore area water depth mapping method based on double unmanned aerial vehicle video stitching is characterized by comprising the following steps:

S4, selecting a sea area to be mapped from the panoramic video subjected to image preprocessing, and generating an orthographic corrected image of the sea area by utilizing the recorded camera position information and camera posture information;

s5, selecting a straight line in the direction perpendicular to the coastline by utilizing the generated orthographic correction image, and then performing image processing to obtain time stack images of all frames;

s6, estimating the water depth information of the corresponding pixel coordinate point through the time stack image by using a cCoth water depth estimation method.

2. The method for mapping the water depth of the coastal sea area based on the video stitching of the double unmanned aerial vehicle according to claim 1, wherein in the step S1, an image processing tool is used to obtain an internal reference matrix of the unmanned aerial vehicle camera, and the internal reference matrix

3. The offshore area water depth mapping method based on double unmanned aerial vehicle video stitching according to claim 1, wherein the step S3 is as follows:

defining the video shot by the first unmanned aerial vehicle as A and the video shot by the second unmanned aerial vehicle as B. For the video shot by each unmanned aerial vehicle, dividing each frame of the video into m parts according to the width and the height of the resolution, and dividing each frame of the video into m parts ² The ith grid is denoted by i, F is set _i (t) representing a homography transformation matrix between a t-th frame and a t+1-th frame of an i-th grid of the video,and the path of the video is defined as the multiplication of the ith grid from the 1 st homography transformation matrix to the t th homography transformation matrix and uses C _i And (t) represents:

C _i (t)＝F _i (t)·F _i (t-1)···F _i (1),1≤t≤T-1,3≤T,1≤i≤m ²

wherein T is the total frame number of a single video, and the video path of each grid i in the video A is set as C _i ^A (t) video path of each grid i in video B is C _i ^B (t) video path optimization equation Θ (P _i ) The following are provided:

wherein Ω _t Representing the range of the adjacent frame r of the t-th frame, P _i (t) represents the smooth video path of the ith grid of video, which is the continuous multiplication of the ith grid of video from the 1 st homography transformation matrix to the t th homography transformation matrix, through C _i (t) iterative optimization, P _i (r) is also the smooth video path of the ith grid of video, is the product of the ith grid of video from the 1 st homography transformation matrix to the r th homography transformation matrix, is through C _i (r) iterative optimization, defining the smooth video path of each grid i of video A as P _i ^A (t) defining the smooth video path of each grid i of video B as P _i ^B (t), λ represents the overall weight for balancing P _i (t)-C _i (t) and P _i (t)-P _i (r) ||, w _t,r For maintaining motion discontinuities at rapid displacements or scene changes, calculated by a gaussian function G:

w _t,r ＝G(||r-t||)·G(||C _i (r)-C _i (t)||)

wherein the Gaussian function G is defined as

x0 is the argument of a Gaussian functionμ is the mean of the gaussian function, σ is the standard deviation of the gaussian function; the iteration of the video path optimization formula is performed by computing C _i (t) updating P _i (t)：

Definition E ^stable (P) is a video path stabilization formula:

The characteristic point is the L characteristic point of the t frame in the video B, and the characteristic point is expressed in a matrix

And->

Respectively represent feature points->

And->

The grid where H represents +.>

And->

Via feature points- >

And->

Calculating the homography matrix;

E(P ^A ,P ^B ,H)＝E ^stable (P ^B )+β·E ^stitch (P ^A ,P ^B ,H)

by means of the simplified video stitching formula,smoothing video path P for video B _i ^B (t) iterating for a predetermined number of times, and then passing through P _i ^B (t)·C _i ^B (t) ^-1 And performing distortion transformation on each grid of each frame in the video B, fusing the obtained result with the video A frame by frame to obtain a panoramic video, and performing image graying and image filtering denoising treatment on each frame of the panoramic video.

4. The offshore area water depth mapping method based on double unmanned aerial vehicle video stitching according to claim 1, wherein the step S4 is as follows:

let coordinate points of world coordinate system be (x) _w ,y _w ,z _w ) The navigation coordinate system adopted is a north east coordinate system, and the coordinates (x _b ,y _b ,z _b ) The carrier coordinate system adopted is a front right lower coordinate system, and the coordinate points of the unmanned aerial vehicle camera coordinate system are (x) _c ,y _c ,z _c ) The adopted camera coordinate system is a lower right front coordinate system, and the process of projecting points in the world space into the camera coordinate is as follows:

wherein the method comprises the steps of

R is an external parameter of the camera _3×3 For the rotation matrix, the camera pose information is determined, and the roll angle of the camera pose information is roll, pitch angle is pitch, and yaw angle is yaw, then the calculation formula of the rotation matrix is:

K is an internal reference matrix of the camera and is used for confirming projection properties of the camera; the transformation formula of the camera imaging model is as follows:

5. The method for mapping the water depth of the coastal region based on the video stitching of the double unmanned aerial vehicle according to claim 4, wherein the process of generating the orthographic correction image for the selected sea region by using the camera position information and the camera posture information in the step S4 is as follows:

converting the north-east coordinate into a local coordinate system along the coast and the direction vertical to the coast, setting the local direction along the coast as an X axis and the direction crossing the coast as a Y axis, and rotating the original north-east coordinate along the plane of the XY axis;

6. The method for mapping the water depth of the offshore area based on the video stitching of the double unmanned aerial vehicle according to claim 1, wherein in the step S5, for the pixel pictures subjected to the orthographic correction, the total number of the pixel pictures is m0, the column number is cols, the line number is rows, the value range of cols is set to be 1-cols, the first column pixel value is started until the cols column pixel value is ended, the cols column pixel values of the first picture to the m0 picture in the orthographic picture are sequentially taken out and arranged to form a time stack picture, the width of the time stack picture is the time length, the column number is m0, the height is the cross-shore length, the line number is rows, and the total number of the time stack picture is cols.

7. The offshore area water depth mapping method based on double unmanned aerial vehicle video stitching according to claim 1, wherein the process of step S6 is as follows:

The calculation formula for deriving the water depth h is as follows:

ω = 2 pi f, the water depth formula is further written as: />

According to the formula, only two of the four physical quantities of the wave speed c, the wave number k, the wave angular frequency omega and the wave frequency f are known, the estimated value of the water depth can be deduced, and then the estimation of the wave frequency f and the wave number k is completed by using a cBathy water depth estimation algorithm based on the time stack diagram obtained in the step S5, so that the water depth h is calculated.