CN108648264B

CN108648264B - Underwater scene reconstruction method based on motion recovery and storage medium

Info

Publication number: CN108648264B
Application number: CN201810377322.6A
Authority: CN
Inventors: 王欣; 杨熙
Original assignee: Jilin University
Current assignee: Jilin University
Priority date: 2018-04-25
Filing date: 2018-04-25
Publication date: 2020-06-23
Anticipated expiration: 2038-04-25
Also published as: CN108648264A

Abstract

The reconstruction method introduces an improved motion recovery algorithm, extracts a motion matrix and establishes the mutual relation between video images; after the redundant image elimination is finished, when feature point matching and point cloud generation are carried out in two steps: firstly, matching feature points on a binocular image, and generating a patch according to the matched feature points in order to obtain denser point cloud data; and then diffusing the surface patches to all visual angles to complete reconstruction of the scene model, and finally performing color correction on the point cloud model according to the imaging characteristics of the underwater scene. The method can still complete a better reconstruction result when only a few input images exist, has better efficiency and precision, and improves the accuracy and robustness of the reconstructed scene to a certain extent.

Description

Underwater scene reconstruction method based on motion recovery and storage medium

Technical Field

The invention relates to a three-dimensional reconstruction method, in particular to an underwater scene reconstruction method based on motion recovery and a storage medium, which can give consideration to both efficiency and precision.

Background

The real world is three-dimensional, and in order to facilitate observation, analysis and extension of the real world, a three-dimensional model needs to be reconstructed in a computer environment. In recent years, with the rapid advancement of computer hardware technology and the gradual change of software, the construction methods of three-dimensional models are more and more, and related software is widely applied to various fields such as medical image processing, 3D printing, computer games, virtual reality, mapping, simulated military training, movie and television entertainment and the like. According to different modes of obtaining reconstruction data, the three-dimensional model construction technology mainly comprises the following steps: the three-dimensional modeling tool is used for directly modeling, modeling by using instrument equipment and a three-dimensional reconstruction technology based on vision.

The modeling using the modeling tool does not need to acquire any data related to reconstruction in advance, and a user transforms the surface of an initial model into a complex structural shape through a series of geometric operations by using basic geometric shapes such as cubes, spheres and the like provided in a special modeling tool or by using a model or texture imported in advance. When the method is used for modeling a large and complex scene, the scene contains a large amount of information and the scene texture is very complex, so that a high-precision scene model is very difficult to obtain, a large amount of manpower and material resources are consumed, and the result is only irregular simulation and reduction of the scene to be reconstructed.

For modeling using instrumentation, a three-dimensional Scanner (3-dimensional scientific Scanner) is one of the currently important tools for three-dimensional modeling of real objects. The method can quickly convert real physical information into digital signals which can be directly processed by a computer, thereby directly obtaining a high-precision three-dimensional model. However, this method relies heavily on instruments to collect information, and is very difficult to collect in large scenes such as mountains, rivers, and the like. Moreover, the requirement of the instrument on the acquisition environment is high, too many interference sources cannot exist, otherwise, a large amount of time is needed for correcting the noise, and the result is unsatisfactory when a complex scene, particularly an underwater scene, is reconstructed.

For the vision-based three-dimensional reconstruction technology, a computer vision method is adopted to reconstruct a model and restore a scene according to a two-dimensional image or video obtained by collection. The method has low requirements on equipment and reconstructed objects, has high reconstruction speed, can complete reconstruction in a full-automatic manner according to input images, and is an extremely active research field of the current computer graphics. Different from the two methods, the input of the method is the image of the object to be reconstructed, and the acquisition difficulty is greatly reduced compared with modeling of instrument equipment. Meanwhile, the method is not limited by the size of a scene and the shape of a model, can be completed automatically or semi-automatically, can be conveniently integrated into daily hardware equipment, and can be widely applied to robot intellectualization, aviation mapping and industrial automation.

As one of the important branches of computer vision, three-dimensional reconstruction based on computer vision is based on Marr's visual theory framework, and in recent years, various scene reconstruction methods have been developed. The methods can be classified into a monocular vision method, a binocular vision method, and a binocular vision method according to the number of cameras at the same time.

Monocular reconstruction algorithms are algorithms that reconstruct images acquired from a single camera, either as one image at a single viewing angle or as multiple images at multiple viewing angles. Monocular reconstruction algorithms have been developed over the years for many mature algorithms. However, the monocular reconstruction algorithm usually needs additional auxiliary information to complete the reconstruction process, such as light, focusing, auxiliary contour, or an image under a large number of viewing angles, because the acquired information is less. The method has more requirements on the shooting environment and the shooting method, and the practical application is limited.

As one of the mainstream methods at present, the binocular stereo vision method realizes scene reconstruction through a binocular image which is really acquired, is similar to the process of human visual perception and observation, has perfect mathematical theory support and has higher reconstruction precision. However, the existing algorithm has many defects that the binocular stereo vision algorithm based on feature point matching has high accuracy and low time complexity, only sparse point cloud of a scene can be obtained, and the reconstruction effect is not ideal; although the binocular stereoscopic vision algorithm based on pixel point matching can obtain dense point cloud data, the precision is reduced, the time complexity is greatly improved compared with the characteristic point matching, and the consumed time is too long when the three-dimensional reconstruction of a large scene is carried out. Meanwhile, the traditional binocular stereo vision algorithm only considers the relation of corresponding frames, so that the reconstructed point cloud lacks the interconnection of different visual angles, the reconstructed point cloud is not naturally connected, and the reconstruction effect is influenced.

In the process of reconstruction of binocular stereo vision, the defects that similar images or repeated parts in the images cause mismatching and the like still occur, therefore, aiming at the problems, multi-view stereo vision reconstruction is provided, and a new camera is further added on the basis of binocular to provide more constraint information so as to improve the final reconstruction accuracy. Although the multi-view stereo vision can reduce the mismatching and the edge blurring to a certain extent in the reconstruction process, with the addition of an additional camera, the images needing to be processed at each view angle are increased, the equipment structure and the physical relationship are further complicated, the operation difficulty and the cost are greatly improved, and the effect is not ideal.

Therefore, how to improve reconstruction efficiency and accuracy for the existing computer vision reconstruction method becomes a technical problem to be solved urgently in the prior art.

Disclosure of Invention

The invention aims to provide a scene reconstruction method based on binocular stereo vision, provides a complete underwater scene reconstruction method based on motion recovery, introduces an improved motion recovery algorithm, realizes extraction of a motion matrix and establishes interrelation between video images; after the redundant image elimination is completed, in order to enhance the robustness of the algorithm, the characteristic point matching and the point cloud generation are carried out in two steps: firstly, matching feature points on a binocular image, and generating a patch according to the matched feature points in order to obtain denser point cloud data; and diffusing the surface patch to all the visual angles to complete the reconstruction of the scene model. And finally, carrying out color correction on the point cloud model according to the imaging characteristics of the underwater scene.

In order to achieve the purpose, the invention adopts the following technical scheme:

an underwater scene reconstruction method based on motion recovery is characterized by comprising the following steps: motion matrix extraction step S110: for each group of newly added binocular images, only one frame of the first eye image is selected to be matched with the previous frame of image for feature points, then motion matrix calculation is carried out, after the motion matrix of the first eye image is obtained through calculation, the motion matrix of the second eye image can be obtained according to calibration between binocular cameras, and new tracking points are selected from the motion matrix and added into a tracking point set;

redundant information removal step S120: first frame image p in first-view video₁It is put into the next frame image p₂Comparing, obtaining a projection matrix K according to the motion matrixes under two visual angles, and obtaining p through the formula (1)₂The point on is mapped to p by the projection matrix K₁Wherein r is₂Is p₂Coordinate of (a), r₂₁Is p₂Projection to p₁The coordinates of (a) to (b) are,

r₂₁＝Kr₂(1)

by comparing the images p₁And p₂Obtaining an image correlation coefficient delta between the two images by the mapped pixels, comparing the image correlation coefficient delta with a threshold value, and if the image correlation coefficient delta is smaller than the set threshold value, obtaining the next frame image p₂Not redundant pictures, retaining p₁And p₂And compare p₂And p₂The adjacent next frame of the first target image, otherwise, the p is determined₂Is a redundant picture, p is removed from the video picture set₂Then p is added₁And p₂Comparing the adjacent first target images of the next frame; repeating the circulation until the last frame of image is compared, then repeating the step of removing redundant information on the image in the second-order video, and obtaining an image set P which is simplified and retains the scene characteristics;

a scene reconstruction step S130, comprising:

an initial matching sub-step, dividing each frame image in the image set P into β × β pixel grids, respectively calculating α local maximum values in each grid as feature points by using DOG operator and Harris operator, matching the first-order image and the second-order image by using the obtained feature points, obtaining a feature point pair set after completing the feature point matching, and obtaining a feature point pair (m) for each pair of matched feature point pairs_l,m_r) Sorting the point pairs from far to near according to the distance between the point pairs and the camera lens, generating point clouds from near to far, and generating a surface patch p of theta × theta pixels by taking m as the center_mThe center of the patch is m, the patch p_mThe normal vector of (a) is a connection line between m and the center point of the reference image camera, and the generated patch p is subjected to_mScreening is carried out; for the generated patch p_mThe screening is carried out as follows: pass-through patch p_mThe image projection matrix of (2) to obtain corresponding affine transformation parameters, and then surface patch p_mRespectively mapped to p_l，p_rTo find p_mAt p_lAnd p_rThe corresponding coordinates of (a); computing p by bilinear interpolation_mAt p_lAnd p_rOnCalculating initial matching correlation coefficient epsilon of the two projection images through a normalized product correlation algorithm, if the initial matching correlation coefficient epsilon is larger than a threshold value, considering that the patch is successfully reconstructed, storing the patch and reconstructing a next pair of feature point pairs, and if not, deleting the patch p_mReconstructing the next pair of characteristic point pairs;

and a diffusion surface patch reconstruction substep: for each patch generated in the initial reconstruction, if no patch exists in the adjacent mesh or the initial matching correlation coefficient epsilon of the patches of the adjacent mesh is less than the initial matching correlation coefficient epsilon of the patch, generating a new patch p taking the initial patch as a reference in the mesh_nNew patch p_nTaking the intersection point of the optical center direction of the grid and the plane where the reference surface patch is as a central point, wherein the normal vector is the same as the reference surface patch; traversing all other images in the image set P, and summing all normal vectors with P_nPutting the image with the normal vector included angle less than 60 degrees into a set U (t) as a contrast image; obtaining corresponding affine transformation parameters through an image projection matrix of a patch and a motion matrix of each image, and then, obtaining a patch p_nRespectively mapped to p_tAnd each image in U (t), using bilinear interpolation to obtain p_nAt p_tAnd the mapping images on U (t), calculating their correlation coefficient ζ₁And all ζ are₁The images in U (t) which are greater than the threshold value are put into a set V (t), if V (t) is empty, p is considered_nCan not be observed by other images, can not meet the reconstruction requirement, and p is deleted_nLooking for the next diffusible point;

a color correction step S140, which includes:

compensation light removal substep: and converting the color of the three-dimensional model from the RGB color space into the HSV color space which is more consistent with the color visual characteristic. Then, determining the information of the compensating light according to the background points, and removing the compensating light in an HSV space;

and (3) completing a syndrome step according to the underwater illumination imaging model: the model color with the compensation light removed is converted into an RGB spatial representation. The color L presented by a point x on the model at this time_λThe calculation method is shown as formula (2):

RGB model representing natural light, N_λRepresenting the absorption rate of seawater, D the depth of seawater, p_xRepresenting the refractive index of the spot;

obtaining the color C of the model by the formula (3) according to the sea depth D of the scene_λ：

C_λ＝L_λ/(N_λ)^Dλ ∈ { red, green, blue } (3).

Optionally, in the step S110 of extracting the motion matrix, the extraction of the feature points uses a Harris corner feature detection algorithm and a SIFT feature point extraction algorithm, the calculation method of the motion matrix uses a linear variation method, and further uses a sparse beam adjustment method to perform parameter optimization, and a more accurate motion matrix is restored by minimizing a projection error between an observation image and a prediction image.

Optionally, in the step S120 of removing redundant information, the image correlation coefficient δ is obtained by calculating the jackard distance of the pixel point after projection, and the threshold value compared with the image correlation coefficient δ is 0.9.

Optionally, in the initial matching substep, the matching of the first-eye image and the second-eye image by using the obtained feature points is as follows: p is to be_lThe feature point obtained by each DOG operator in the above step and p_rThe characteristic points obtained by the DOG operator are matched, and p is obtained_lThe characteristic point obtained by each Harris operator in the sequence is compared with p_rAnd matching the characteristic points obtained by the Harris operator.

Optionally, in the sub-step of reconstructing a diffusion patch, the patch p generated by diffusion is also subjected to_nOptimizing and adjusting to make its correlation coefficient on other images as large as possible, and the z coordinate of the center point of the patch, the patch p_nAngle of inclination of normal vector to p_nMake an adjustment to recalculate p_nThe center point and normal vector of (a); updating image sets U (t) and V (t), and correlating coefficient ζ₂If the number of elements in V (t) is more than k, and k is the number of images with diffusion points being observed, the diffusion of the patch is considered to be successful, and the newly generated patch p is stored_nAnd otherwise, deleting the newly generated patch, and considering the next possible diffusion point until the diffusion cannot be performed.

Optionally, in the sub-step of reconstructing the diffusion patch, in the optimization adjustment process, the method optimizes the normal vector by changing the normal vector in a conical space of 15 degrees to calculate v (t) of several edge values, and compares the v (t) with the original normal vector to select a maximum value; where k is preferably 3.

Optionally, in the scene reconstructing step S130, after all diffusion is completed, the erroneous patches are removed, the patches in each grid whose initial matching correlation coefficient epsilon is smaller than the average correlation coefficient are deleted, and patch clusters that only include a few patches and are far from all other patches are deleted.

Optionally, in the sub-step of removing the compensating light, a background point which is as deep as the sea bed and far enough is searched in the video, and the brightness of the background point is used as the uniform brightness of the whole model to remove the compensating light.

Further, the present invention also discloses a storage medium for storing computer executable instructions, which is characterized in that: the computer executable instructions, when executed by a processor, perform the above-described underwater scene reconstruction method.

The method can still complete a better reconstruction result when only a few input images exist, has better efficiency and precision, and improves the accuracy and robustness of the reconstructed scene to a certain extent.

Drawings

Fig. 1 is a flow chart of a method for motion recovery based underwater scene reconstruction in accordance with a specific embodiment of the present invention;

FIG. 2 is a schematic diagram comparing a motion recovery algorithm according to an embodiment of the present invention with a conventional algorithm;

FIG. 3 is a schematic diagram of patch screening according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of patch optimization adjustment according to an embodiment of the present invention;

FIG. 5 is a schematic illustration of the principles of underwater imaging according to a specific embodiment of the present invention;

FIGS. 6(a) - (d) are four frames of an image with redundancy removed from the image in a reconstructed underwater video according to a specific embodiment of the present invention;

FIG. 7 is a model of a seabed three-dimensional point cloud obtained through scene reconstruction according to an embodiment of the present invention;

FIG. 8 is the final result of the sea bed after color correction according to a specific embodiment of the present invention;

FIG. 9 is a comparison of a reconstructed dinosaur model and a laser scanning model by using the reconstruction method of the present invention, wherein FIG. 9(a) is a laser scanning model and FIG. 9(b) is an algorithm reconstructed model of the present invention;

fig. 10 is a comparison between a temple model reconstructed by the reconstruction method of the present invention and a laser scanning model, wherein fig. 10(a) is a laser scanning model and fig. 10(b) is an algorithm reconstruction model of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

The invention discloses an underwater scene reconstruction method based on motion recovery, which comprises the following steps: in order to establish the interconnection between video images, improved motion recovery is introduced to realize the extraction of a motion matrix; after the redundant image elimination is completed, in order to enhance the robustness of the algorithm, the characteristic point matching and the point cloud generation are carried out in two steps: firstly, matching feature points on a binocular image, and generating a patch according to the matched feature points in order to obtain denser point cloud data; diffusing the surface patches to all the visual angles to complete the reconstruction of the scene model; and finally, carrying out color correction on the point cloud model according to the imaging characteristics of the underwater scene.

Referring to fig. 1, a flowchart of an underwater scene reconstruction method based on motion recovery according to the present invention is shown, in which underwater scene videos shot by a binocular camera are respectively marked as a left-eye video and a right-eye video, and are disassembled into an image set, and images are self-calibrated to eliminate lens distortion, and further includes the following steps:

motion matrix extraction step S110:

after the initialization is completed, the traditional method for extracting the motion matrix matches the added image with all the calculated images before for each frame to optimize all the motion matrices before, so that the time taken by the whole algorithm increases exponentially as the images increase.

Because the input of the invention is the ordered video frame, the upper and lower frames in the video contain a large amount of repeated information, for each newly added image, the corresponding characteristic points which can be matched by the current frame and all the images which have finished iteration are mostly concentrated on the image of the upper frame, and the matching characteristic points between the adjacent frames are enough to finish the calculation of the motion matrix, so the traditional method is optimized.

Therefore, the steps are specifically: for each group of newly added binocular images, only the first image (such as the left image) is selected to be matched with the feature points of the last frame image (the last frame left image), then the motion matrix calculation is carried out, and meanwhile, because the binocular cameras are strictly calibrated in advance, after the motion matrix of the first camera (namely, the left camera) is obtained through calculation, the motion matrix of the second camera (namely, the right camera) can be obtained according to the calibration between the binocular cameras, and new tracking points are selected from the motion matrix and added into the tracking point set.

In the embodiment of the present invention, the motion matrix of the left-eye camera is first obtained, and then the motion matrix of the right-eye camera is obtained according to the calibration between the binocular cameras, but obviously, this is merely an example, and the motion matrix of the right-eye camera may be obtained first, and then the motion matrix of the left-eye camera is obtained according to the calibration between the binocular cameras.

Further, in the invention, a Harris corner feature detection algorithm and an SIFT feature point extraction algorithm are adopted for extracting feature points, and a Linear Transformation (DLT) method is adopted for calculating a motion matrix. Since the left and right eye cameras are identical and have been calibrated, the intrinsic parameters have been determined. The calculation of the motion matrix is therefore the acquisition of the extrinsic parameters of the camera. And further adopts a Sparse Beam Adjustment (SBA) method to optimize parameters, and recovers a more accurate motion matrix by minimizing the projection error between the observation image and the prediction image. The method of minimization is to iteratively find the minimum solution for the total projection error.

Referring to fig. 2, a schematic diagram comparing a motion recovery algorithm according to an embodiment of the present invention with a conventional algorithm is shown.

Redundant information removal step S120:

due to the continuity of the video, a large number of redundant frames exist in the video, and the images are highly repeated with adjacent images, often contain no additional key information, and the whole video is directly processed, so that the processing is cumbersome and inefficient. The redundant frames are removed, so that the operation efficiency of the algorithm is greatly improved.

When redundant images are removed, the motion matrix under each visual angle is acquired in the motion matrix extracting step, so that the video images can be screened according to the similarity degree of adjacent images.

Therefore, the steps are specifically: first frame image p in first-view video₁It is put into the next frame image p₂Comparing, obtaining a projection matrix K according to the motion matrixes under two visual angles, and obtaining p through the formula (1)₂The point on is mapped to p by the projection matrix K₁Wherein r is₂Is p₂Coordinate of (a), r₂₁Is p₂Projection to p₁The coordinates of (a) to (b) are,

r₂₁＝Kr₂(1)

by comparing the images p₁And p₂Obtaining an image correlation coefficient delta between the two images by the mapped pixels, comparing the image correlation coefficient delta with a threshold value, and if the image correlation coefficient delta is smaller than the set threshold value, obtaining the next frame image p₂Not redundant pictures, retaining p₁And p₂And compare p₂And p₂The adjacent next frame of the left eye image, otherwise, the next frame of the left eye image is determined as p₂Is a redundant picture, p is removed from the video picture set₂Then p is added₁And p₂Comparing the adjacent first target images of the next frame; this cycle is repeated until the last frame of image is compared, and then the step of removing redundant information is repeated for the images in the second-order video, and an image set P with the characteristics of the scene preserved after the reduction is obtained.

Further, the image correlation coefficient δ is obtained by calculating the jackard distance of the pixel point after projection, and in the present invention, the threshold value compared with the image correlation coefficient δ is 0.9, so as to take efficiency and accuracy into consideration.

Scene reconstruction step S130:

the step is used for starting matching of feature points and establishment of patches after redundancy is removed, and comprises two substeps of feature point matching and initial patch generation, and finishing reproduction of diffusion patches,

an initial matching substep:

in the initial matching stage, in order to facilitate the extraction and matching of the feature points, each frame of image in the image set P is divided into β × β pixel grids, α local maximum values in each grid are respectively calculated by using a DOG operator and a Harris operator to serve as the feature points, the obtained feature points are used for matching of the left eye image and the right eye image, after the feature point matching is completed, a feature point pair set is obtained, and for each pair of matched feature point pairs (m < m > - β pixel pairs) is obtained_l,m_r) Sorting the point pairs from far to near according to the distance between the point pairs and the camera lens, generating point clouds from near to far, and generating a surface patch p of theta × theta pixels by taking m as the center_mThe center of the patch is m, the patch p_mIn a camera with a normal vector of m and a reference imageConnecting the centroids, and aligning the generated patches p_mAnd (5) screening.

Because the binocular camera used in the invention is strictly calibrated and carries out stereo correction on the binocular images, the spatial information of any point on the left eye image can be quickly calculated through the mutual relationship between any point and any point as long as the corresponding point is found on the right eye image. Moreover, the matching is very high in precision, so that initial matching and point cloud generation under a single visual angle can be completed by using a binocular image.

In the specific matching, a pair of left and right eye image pairs (p)_l，p_r) Selecting its left eye image p_lAs a reference image, it is combined with the right eye image p_rAnd carrying out feature point matching. Since is already at p_rAnd p_lThe grid division and the feature point extraction are carried out, and the two images are subjected to stereo correction, namely the corresponding polar lines are on the same straight line. Thus, p is_lThe feature point obtained by each DOG operator in the above step and p_rThe characteristic points obtained by the DOG operator are matched, and p is obtained_lThe characteristic point obtained by each Harris operator in the sequence is compared with p_rAnd matching the characteristic points obtained by the Harris operator.

For the generated patch p_mThe screening may specifically be: since at this time p_mAnd p_l，p_rAre all known, by patch p_mThe image projection matrix of (2) to obtain corresponding affine transformation parameters, and then surface patch p_mRespectively mapped to p_l，p_rTo find p_mAt p_lAnd p_rThe corresponding coordinates of (a); computing p by bilinear interpolation_mAt p_lAnd p_rCalculating the initial matching correlation coefficient epsilon of the two projection images through a normalized product correlation algorithm, if the initial matching correlation coefficient epsilon is larger than a threshold value, considering that the patch is successfully reconstructed, storing the patch and reconstructing a next pair of feature point pairs, and if not, deleting the patch p_mAnd reconstructing the next pair of characteristic point pairs.

Referring to fig. 3, a schematic of screening of the slides is shown.

And a diffusion surface patch reconstruction substep:

in order to obtain a dense reconstruction result of multiple visual angles, the initial surface patches are used as seed points and spread to the periphery, and at least one surface patch exists in each grid as much as possible, so that the reconstruction of the model is completed.

The method comprises the following specific steps: for each patch generated in the initial reconstruction, if no patch exists in the adjacent mesh or the initial matching correlation coefficient epsilon of the patches of the adjacent mesh is less than the initial matching correlation coefficient epsilon of the patch, generating a new patch p taking the initial patch as a reference in the mesh_nNew patch p_nThe intersection point of the optical center direction of the grid and the plane of the reference surface patch is taken as a central point, the normal vector is the same as the reference surface patch, and the image p_tIs the image plane where the grid is located; traversing all other images in the image set P, and summing all normal vectors with P_nPutting the image with the normal vector included angle less than 60 degrees into a set U (t) as a contrast image; obtaining corresponding affine transformation parameters through an image projection matrix of a patch and a motion matrix of each image, and then, obtaining a patch p_nRespectively mapped to p_tAnd each image in U (t), using bilinear interpolation to obtain p_nAt p_tAnd the mapping images on U (t), calculating their correlation coefficient ζ₁And all ζ are₁The images in U (t) which are greater than the threshold value are put into a set V (t), if V (t) is empty, p is considered_nCan not be observed by other images, can not meet the reconstruction requirement, and p is deleted_nThe next diffusible point is examined.

Further, the patch p generated by diffusion is also processed in the present sub-step_nOptimizing and adjusting to make its correlation coefficient on other images as large as possible, and the z coordinate of the center point of the patch, the patch p_nAngle of inclination of normal vector to p_nMake an adjustment to recalculate p_nThe center point and normal vector of (a); updating image sets U (t) and V (t) due to p_nHaving optimized, when updating the set v (t),will correlation coefficient ζ₂Is further increased, when the number of elements in v (t) is greater than k, k is the number of images in which the diffusion point can be observed, that is, when the patch p is present_nIf the image is observed by enough images under other visual angles, the facet diffusion is considered to be successful, and the newly generated facet p is stored_n(ii) a Otherwise, deleting the newly generated patch, and considering the next possible diffusion point until the diffusion cannot be performed.

In the optimization adjustment process, optimizing the normal vector by changing the normal vector in a conical space of 15 degrees to calculate V (t) of a plurality of edge values, and comparing the V (t) with the original normal vector to select a maximum value; where k is preferably 3. Fig. 4 is a schematic diagram of patch optimization adjustment.

After the diffusion is completed, the wrong patches need to be removed because redundant patches and wrong diffusion points may exist in the diffusion process. And deleting the patches of which the initial matching correlation coefficient epsilon is smaller than the average correlation coefficient in each grid, and deleting the patch clusters which only contain a few patches and are far away from all other patches.

Color correction step S140:

compensation light removal substep:

and converting the color of the three-dimensional model from the RGB color space into the HSV color space which is more consistent with the color visual characteristic. And then, determining the information of the compensating light according to the background points, and removing the compensating light in the HSV space.

Specifically, since the change in brightness caused by the irradiation of the ambient light through the sea water due to the local depth change of the sea bed is slight, the brightness of the entire reconstructed sea bed model is uniform. A background point which is as deep as the sea bed and far enough is searched in the video, the distance between the background point and the camera compensation light source is assumed to be large enough, the influence of the point on the compensation light can be ignored, and the brightness of the point is used as the uniform brightness of the whole model to realize the removal of the compensation light.

And (3) completing a syndrome step according to the underwater illumination imaging model:

after the influence of the compensating light is eliminated, the color can be corrected through an underwater illumination imaging model in the RGB space.

The model color with the compensation light removed is converted into an RGB spatial representation. The color L presented by a point x on the model at this time_λThe calculation method is shown as formula (2):

according to the sea depth D of the scene, the color C of the model can be obtained by the formula (3)_λ：

C_λ＝L_λ/(N_λ)^DLambda ∈ { red, green, blue } (3)

Referring to fig. 5, a schematic diagram of the underwater imaging principle is shown.

Example 1:

the underwater scene reconstruction method based on motion recovery can use computer language programming and run on a development platform to realize the functions.

In embodiment 1, the underwater scene reconstruction method based on motion recovery was developed on the visual studio2010 platform using C + + language and verified on a set of bottom videos captured by a strictly calibrated binocular GoPro 2 camera.

Two segments of avi format videos with the duration of 7s are used for shooting by the left-eye camera and the right-eye camera respectively. Each video segment contains 210 frames of images, and after redundant images are removed, an image set to be reconstructed containing 34 frames is obtained. Fig. 6(a) - (d) show four frames in the image after removing redundancy in the image in the reconstructed underwater video, fig. 7 shows a seabed three-dimensional point cloud model obtained by scene reconstruction of the image, and fig. 8 shows the final result of the three-dimensional point cloud model after color correction.

Example 2:

in order to further evaluate the effect of the underwater scene reconstruction method, the method of the invention respectively adopts two groups of multi-view model images with laser scanning data to reconstruct by the method of the invention and carries out quantitative comparative analysis.

Fig. 9 is a comparison between a dinosaur model reconstructed by the reconstruction method of the present invention and a laser scanning model, wherein fig. 9(a) is a laser scanning model, and fig. 9(b) is an algorithm reconstructed model of the present invention.

It can be seen that the present invention can achieve better reconstruction results even when there are only a few input images. Namely, the invention has better efficiency and precision.

The invention can evaluate the quality of the reconstruction algorithm through two parameters:

A＝max{|D_W|} (4)

1. accuracy A, i.e. generating a set D of points of the model W_WMaximum distance from laser scan data;

C＝R_d/R (5)

2. degree of matching C, i.e. set of points R at a given distance d of the generated model from the points of the laser scan model less than d_dAs a percentage of the total set of points R.

In consideration of the characteristics of the underwater scene, the accuracy is determined to be 80% and the matching degree is determined to be 0.25 mm.

TABLE 1 reconstruction accuracy of quantitative calculations

Therefore, the invention establishes the interrelation among the video frames by extracting the motion matrixes of the cameras under different visual angles through an improved motion recovery algorithm, provides a basis for subsequent redundancy removal and scene reconstruction, and then designs an algorithm to remove redundant images according to the characteristics of the underwater video, thereby improving the efficiency of the algorithm. And finally, carrying out color correction to ensure that the final color of the model is as close as practical. Compared with other reconstruction methods and laser scanning results in the prior art, the method can still complete a better reconstruction result when only a few input images exist, has better efficiency and precision, and improves the accuracy and robustness of the reconstructed scene to a certain extent.

It will be apparent to those skilled in the art that the various elements or steps of the invention described above may be implemented using a general purpose computing device, they may be centralized on a single computing device, or alternatively, they may be implemented using program code that is executable by a computing device, such that they may be stored in a memory device and executed by a computing device, or they may be separately fabricated into various integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

While the invention has been described in further detail with reference to specific preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An underwater scene reconstruction method based on motion recovery is characterized by comprising the following steps:

motion matrix extraction step S110: for each group of newly added binocular images, only one frame of the first eye image is selected to be matched with the previous frame of image for feature points, then motion matrix calculation is carried out, after the motion matrix of the first eye image is obtained through calculation, the motion matrix of the second eye image can be obtained according to calibration between binocular cameras, and new tracking points are selected from the motion matrix and added into a tracking point set;

r₂₁＝Kr₂(1)

a scene reconstruction step S130, comprising:

an initial matching sub-step, dividing each frame of image in the image set P into β × β pixel grids, respectively calculating α local maximum values in each grid as feature points by using DOG operator and Harris operator, matching the first and second images by using the obtained feature points, and obtaining the feature points after completing the feature point matchingA set of point pairs, for each pair of matched characteristic point pairs (m)_l,m_r) Sorting the point pairs from far to near according to the distance between the point pairs and the camera lens, generating point clouds from near to far, and generating a surface patch p of theta × theta pixels by taking m as the center_mThe center of the patch is m, the patch p_mThe normal vector of (a) is a connection line between m and the center point of the reference image camera, and the generated patch p is subjected to_mScreening is carried out; for the generated patch p_mThe screening is carried out as follows: pass-through patch p_mThe image projection matrix of (2) to obtain corresponding affine transformation parameters, and then surface patch p_mRespectively mapped to p_l，p_rTo find p_mAt p_lAnd p_rThe corresponding coordinates of (a); computing p by bilinear interpolation_mAt p_lAnd p_rCalculating the initial matching correlation coefficient epsilon of the two projection images through a normalized product correlation algorithm, if the initial matching correlation coefficient epsilon is larger than a threshold value, considering that the patch is successfully reconstructed, storing the patch and reconstructing a next pair of feature point pairs, and if not, deleting the patch p_mReconstructing the next pair of characteristic point pairs;

and a diffusion surface patch reconstruction substep: for each patch generated in the initial reconstruction, if no patch exists in the adjacent mesh or the initial matching correlation coefficient of the patch of the adjacent mesh is less than that of the patch, generating a new patch p taking the initial patch as a reference in the mesh_nNew patch p_nThe intersection point of the optical center direction of the grid and the plane of the reference surface patch is taken as a central point, the normal vector is the same as the reference surface patch, and p_tIs the image plane where the grid is located; traversing all other images in the image set P, and summing all normal vectors with P_nPutting the image with the normal vector included angle less than 60 degrees into a set U (t) as a contrast image; obtaining corresponding affine transformation parameters through an image projection matrix of a patch and a motion matrix of each image, and then, obtaining a patch p_nRespectively mapped to p_tAnd each image in U (t), using bilinear interpolation to obtain p_nAt p_tAnd the mapping images on U (t), calculating their correlation coefficient ζ₁And all ζ are₁The images in U (t) which are greater than the threshold value are put into a set V (t), if V (t) is empty, p is considered_nCan not be observed by other images, can not meet the reconstruction requirement, and p is deleted_nLooking for the next diffusible point;

a color correction step S140, which includes:

compensation light removal substep: converting the color of the three-dimensional model from an RGB color space into an HSV color space which is more in line with the color visual characteristic, then determining the information of compensating light according to background points, and removing the compensating light in the HSV color space;

and (3) completing a syndrome step according to the underwater illumination imaging model: converting the model color without the compensation light into RGB space representation, wherein the color L presented by one point x on the model_λThe calculation method is shown as formula (2):

C_λ＝L_λ/(N_λ)^Dλ ∈ { red, green, blue } (3).

2. The underwater scene reconstruction method according to claim 1, characterized in that:

in the step S110 of extracting the motion matrix, the extraction of the feature points uses a Harris corner feature detection algorithm and a SIFT feature point extraction algorithm, the calculation method of the motion matrix uses a linear variation method, and further uses a sparse beam adjustment method to perform parameter optimization, and a more accurate motion matrix is restored by minimizing the projection error between the observation and the prediction image.

3. The underwater scene reconstruction method according to claim 1, characterized in that:

in the step S120 of removing redundant information, the image correlation coefficient δ is obtained by calculating the jackard distance of the pixel point after projection, and the threshold value compared with the image correlation coefficient δ is 0.9.

4. The underwater scene reconstruction method according to claim 1, characterized in that:

in the initial matching substep, the matching of the first and second target images by using the obtained feature points is as follows: p is to be_lThe feature point obtained by each DOG operator in the above step and p_rThe characteristic points obtained by the DOG operator are matched, and p is obtained_lThe characteristic point obtained by each Harris operator in the sequence is compared with p_rAnd matching the characteristic points obtained by the Harris operator.

5. The underwater scene reconstruction method according to claim 1, characterized in that:

in the sub-step of reconstructing the diffused patch, the diffusion-generated patch p is also subjected to_nOptimizing and adjusting to make its correlation coefficient on other images as large as possible, and the z coordinate of the center point of the patch, the patch p_nAngle of inclination of normal vector to p_nMake an adjustment to recalculate p_nThe center point and normal vector of (a); updating image sets U (t) and V (t), further increasing the threshold value of the correlation coefficient, if the number of elements in V (t) is more than k, and k is the number of images with which diffusion points can be observed, considering that the diffusion of the surface patch is successful, and storing a newly generated surface patch p_nAnd otherwise, deleting the newly generated patch, and considering the next possible diffusion point until the diffusion cannot be performed.

6. The underwater scene reconstruction method according to claim 2, characterized in that:

in the sub-step of reconstructing the diffusion surface patch, in the optimization adjustment process, optimizing V (t) of several edge values calculated by changing a normal vector in a conical space with 15 degrees of the normal vector, and comparing the V (t) with the original normal vector to select a maximum value; wherein k is 3.

7. The underwater scene reconstruction method according to claim 1, characterized in that:

in the scene reconstruction step S130, after all diffusion is completed, the erroneous patches are also removed, and the patches whose initial matching correlation coefficient e is smaller than the average correlation coefficient in each mesh are deleted, and patch clusters that only contain a few patches and are far from all other patches are deleted.

8. The underwater scene reconstruction method according to claim 1, characterized in that:

in the sub-step of removing the compensating light, a background point which is as deep as the sea bed and far enough is searched in the video, and the brightness of the background point is used as the uniform brightness of the whole model to realize the removing of the compensating light.

9. A storage medium for storing computer-executable instructions, characterized in that:

the computer executable instructions, when executed by a processor, perform the underwater scene reconstruction method of any one of claims 1-8.