CN112991419A

CN112991419A - Parallax data generation method and device, computer equipment and storage medium

Info

Publication number: CN112991419A
Application number: CN202110258449.8A
Authority: CN
Inventors: 尹康
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-03-09
Filing date: 2021-03-09
Publication date: 2021-06-18
Anticipated expiration: 2041-03-09
Also published as: CN112991419B

Abstract

The application discloses a parallax data generation method, a parallax data generation device, computer equipment and a storage medium, and belongs to the technical field of computer vision. The method comprises the following steps: acquiring at least one group of original image pairs, wherein the original image pairs comprise an original left eye image and an original right eye image; carrying out optical flow extraction on the original image pairs to obtain optical flow values corresponding to each group of original image pairs; performing optical flow screening on the original image pair based on the optical flow values to obtain a target monocular image and a target optical flow value corresponding to each pixel point in the target monocular image, wherein the target monocular image is an original left eye image or an original right eye image; and generating target parallax data corresponding to the target monocular image based on the target optical flow value. The method improves the obtaining efficiency of the parallax data, can quickly construct a large number of pre-training samples for training the monocular depth estimation model based on the parallax data, and further can improve the training efficiency of the depth estimation model.

Description

Parallax data generation method and device, computer equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of computer vision, in particular to a parallax data generation method and device, computer equipment and a storage medium.

Background

Depth information estimation based on images is one of basic tasks in the field of computer vision, and is widely applied in the fields of 3-dimensional reconstruction, Augmented Reality (AR) interaction and the like. Based on the difference of the number of images used for depth information estimation, the related algorithms for depth estimation can be divided into two categories, i.e., multi-view image depth estimation and monocular image depth estimation. The monocular depth estimation algorithm has a wider application prospect, and in the field of monocular image depth estimation, depth information estimation is mainly achieved through a network model training mode.

In the related art, when training a network model for estimating depth information, it is often necessary to obtain enough training samples, where the required training samples are in the form of: the image processing method comprises a monocular image and a matched single-channel depth spectrum, wherein the value of each position on the depth spectrum represents the distance between the corresponding position of an original image and a camera. A general acquisition method is to acquire an image by a special sensor to acquire an image with depth information. However, the acquisition method is high in acquisition cost and low in acquisition efficiency, and cannot be used for quickly acquiring an image with depth information.

Disclosure of Invention

The embodiment of the application provides a parallax data generation method and device, computer equipment and a storage medium. The technical scheme is as follows:

in one aspect, an embodiment of the present application provides a disparity data generating method, where the method includes:

acquiring at least one group of original image pairs, wherein the original image pairs comprise an original left eye image and an original right eye image, and the original left eye image and the original right eye image are images corresponding to the same scene observed under a left eye visual angle and a right eye visual angle;

performing optical flow extraction on the original image pairs to obtain optical flow values corresponding to each group of the original image pairs;

performing optical flow screening on the original image pair based on the optical flow value to obtain a target monocular image and a target optical flow value corresponding to each pixel point in the target monocular image, wherein after optical flow screening, the relative size relationship between each pixel point and the target optical flow value is the same as the relative size relationship between parallax data, and the target monocular image is an original left eye image or an original right eye image;

and generating target parallax data corresponding to the target monocular image based on the target optical flow value.

In another aspect, an embodiment of the present application provides a disparity data generating apparatus, where the apparatus includes:

the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring at least one group of original image pairs, the original image pairs comprise an original left eye image and an original right eye image, and the original left eye image and the original right eye image are images corresponding to the same scene observed under a left eye visual angle and a right eye visual angle;

the optical flow extraction module is used for carrying out optical flow extraction on the original image pairs to obtain optical flow values corresponding to each group of the original image pairs;

the optical flow screening module is used for carrying out optical flow screening on the original image pair based on the optical flow value to obtain a target monocular image and a target optical flow value corresponding to each pixel point in the target monocular image, wherein the relative size relationship between the target optical flow values corresponding to each pixel point after optical flow screening is the same as the relative size relationship between parallax data, and the target monocular image is an original left eye image or an original right eye image;

and the generating module is used for generating target parallax data corresponding to the target monocular image based on the target optical flow value.

In another aspect, an embodiment of the present application provides a computer device, which includes a processor and a memory, where the memory stores at least one program code, and the program code is loaded and executed by the processor to implement the disparity data generating method according to the above aspect.

In another aspect, an embodiment of the present application provides a computer-readable storage medium, in which at least one program code is stored, and the program code is loaded and executed by a processor to implement the disparity data generating method according to the above aspect.

In another aspect, embodiments of the present application provide a computer program product or a computer program, which includes computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the disparity data generating method provided in the various alternative implementations of the above-described aspects.

The technical scheme provided by the embodiment of the application can bring the following beneficial effects:

in a monocular image depth estimation application scene, the original left eye image and the original right eye image with natural parallax information can be subjected to optical flow extraction and optical flow screening, so that corresponding parallax data can be quickly extracted based on screened optical flow values, and the parallax data acquisition efficiency is improved; due to the relation between the parallax data and the depth information, correspondingly, a large number of pre-training samples for training the monocular depth estimation model can be quickly constructed on the basis of the parallax data, and the training efficiency of the depth estimation model can be further improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart illustrating a parallax data generation method according to an exemplary embodiment of the present application;

FIG. 2 illustrates a flow chart of a disparity data generation method provided by another exemplary embodiment of the present application;

FIG. 3 is a schematic diagram of cropping an image frame;

FIG. 4 illustrates a flow chart of a disparity data generation method provided by another exemplary embodiment of the present application;

FIG. 5 illustrates a schematic diagram of a process for acquiring disparity data according to an exemplary embodiment of the present application;

fig. 6 is a block diagram illustrating a configuration of a disparity data generating apparatus according to an embodiment of the present application;

fig. 7 shows a block diagram of a computer device according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Referring to fig. 1, a flowchart of a disparity data generating method according to an exemplary embodiment of the present application is shown, where the method is applied to a computer device as an example, the method includes:

step 101, at least one group of original image pairs is obtained, the original image pairs comprise original left eye images and original right eye images, and the original left eye images and the original right eye images are images corresponding to the same scene observed under a left eye visual angle and a right eye visual angle.

In the embodiment of the application, the purpose of obtaining disparity data is to serve as a training sample set for performing deep learning training, correspondingly, a large amount of sample images and labeled depth information often need to be obtained, based on a relationship between disparity information and depth information (a disparity value can be regarded as a reciprocal of depth), a task of obtaining depth information can be converted into a task of obtaining disparity information, and in order to obtain a sample image from which disparity information can be extracted as quickly as possible, an original image pair including an original left-eye image and an original right-eye image can be used for extracting disparity information (because the original left-eye image and the original right-eye image have natural disparity).

Optionally, the original left-eye image and the original right-eye image may be extracted from Three-dimensional (3D) video or 3D image data stored in left-right format, and illustratively, the original image pair may be extracted from a 3D movie resource.

And 102, carrying out optical flow extraction on the original image pairs to obtain optical flow values corresponding to each group of original image pairs.

Wherein the raw image pair can be optical flow extracted by optical flow extraction network (FlowNet 2); alternatively, other optical flow calculation methods may be used besides the optical flow extraction network, such as: the optical flow extraction algorithm used in the embodiments of the present application is not limited to a gradient-based method, a matching-based method, an energy-based method, a phase-based method, and the like.

In order to quickly construct a large-scale monocular depth estimation training data set, meanwhile, the constructed data set is only used in a pre-training process in a monocular depth estimation model training process, accurate image parallax information does not need to be acquired in pre-training, and only the acquired parallax information needs to be ensured to meet a spatial relative relationship (the parallax of a near object is large, the parallax of a far object is small), the original left eye image and the original right eye image are naturally disassembled, and the optical flow value between the left eye image and the right eye image and the parallax value have a consistent relative size relationship (the position parallax with large optical flow is also large), so in one possible implementation mode, the optical flow extraction can be carried out on the original image (the left eye image and the right eye image), and the parallax information corresponding to the original image is determined based on the optical flow value of the original image.

In one possible implementation, with the left eye image as a reference frame, the optical flows from the left eye image to the right eye image are extracted first, which may be referred to as forward optical flows, and then the optical flows from the right eye image to the left eye image are extracted, which may be referred to as backward optical flows. Illustratively, the corresponding forward optical flow of the original image pair may be denoted as F_fThe backward optical flow can be noted as F_b。

Optionally, optical flow extraction may also be performed with the right-eye image as a reference frame.

And 103, carrying out optical flow screening on the original image pair based on the optical flow values to obtain a target monocular image and a target optical flow value corresponding to each pixel point in the target monocular image.

Because there may be extraction errors in the optical flow extraction process, and the accuracy of optical flow extraction is related to the accuracy of the parallax data, in order to improve the accuracy of the determined parallax data, in one possible implementation, it is necessary to eliminate an image pair that does not satisfy the optical flow relationship corresponding to the left and right target images in the original image pair, where the relative size relationship between the target optical flow values corresponding to each pixel point after optical flow screening is the same as the relative size relationship between the parallax data.

Optionally, because the obtained disparity data is a training sample set used as monocular depth estimation, the training sample set is: the monocular image and the parallax data corresponding to the monocular image, and the target image pair is obtained after optical flow screening, and the original left eye image or the original right eye image in the target image pair can be directly determined as the target monocular image.

In a possible embodiment, an optical flow filtering condition is set, and the optical flow filtering condition is used for eliminating error values in the predicted optical flow value, so that the accuracy of the parallax extraction process can be improved in the subsequent parallax information extraction process.

And 104, generating target parallax data corresponding to the target monocular image based on the target optical flow value.

In one possible embodiment, the target optical flow value is converted to target parallax data corresponding to the target monocular image based on a relationship between the target optical flow value and the parallax data.

In summary, in the embodiment of the application, in a monocular image depth estimation application scene, optical flow extraction and optical flow screening can be performed on an original left eye image and an original right eye image with natural parallax information, so that corresponding parallax data can be quickly extracted based on the screened optical flow values, and the parallax data acquisition efficiency is improved; due to the relation between the parallax data and the depth information, correspondingly, a large number of pre-training samples for training the monocular depth estimation model can be quickly constructed on the basis of the parallax data, and the training efficiency of the depth estimation model can be further improved.

Since the accuracy of the parallax data depends on the extracted optical flow values, it is important to ensure that the extracted optical flow values conform to the parallax relationship, and when the same scene is observed at the same time in the left and right eye view angles, certain optical flow value conditions exist, such as: the optical flow in the vertical direction is almost 0 everywhere, the optical flow in the horizontal direction has larger amplitude, the optical flow precision is higher, and the like; in one possible implementation, optical flow filtering may be performed based on the optical flow value condition.

In an exemplary example, as shown in fig. 2, a flowchart of a parallax data generating method provided in another exemplary embodiment of the present application is shown, where the embodiment of the present application is described by taking an application of the method to a computer device as an example, the method includes:

step 201, obtaining a target video in a target storage format, where the target storage format at least includes a left storage format and a right storage format.

In order to extract a large number of left and right eye images quickly, in one possible implementation, target videos stored in a left and right eye format are directly used as raw data, and the target videos may be, for example: 3D video, such as a 3D movie, stored in a left-right-eye format.

Optionally, the selected 3D movie data needs to satisfy the conditions of left-right eye format storage, and also needs to satisfy the following conditions, in order to make the training data conform to the actual application scene as much as possible, avoid interference to the subsequent network learning process due to exaggerated effects such as animation, special effects and the like, and non-conformance to the perspective rules in reality, and select 3D data which is mainly in the actual scene and contains few fragments such as animation, special effects and the like as much as possible; in addition, in order to make the resolution of the training data equal to the image resolution in practical application, avoid that the application range of the model is affected due to the small resolution of the training data (only the model can be applied to processing pictures with small resolution), and select 3D data with high resolution as much as possible, illustratively, the resolution needs to be guaranteed to be greater than 720P.

At least one group of candidate image pairs is extracted from the target video, and the candidate image pairs comprise a candidate left-eye image and a candidate right-eye image.

The selected target video is stored in a left-right target format, and correspondingly, at least one group of image pairs contained in the target video can be obtained by performing video frame extraction on the target video.

In an exemplary example, taking an original 3D movie resource as an example, the process of extracting candidate image pairs from the 3D movie resource may include the following steps:

1. reading chapter information in the original 3D resources, and removing preset chapters.

The preset chapters may be: chapters that do not include image information include, for example, chapters of pictures including company names, company logos (logos), and cast tables.

Since a chapter with too long duration is likely to be a scene changing rapidly, if the chapter is sampled, the captured video frames may be blurred, and accuracy of subsequent optical flow extraction may be affected, therefore, in one possible embodiment, a first time threshold is set, and chapters with chapter durations smaller than the first time threshold are removed, and illustratively, the first time threshold may be 5 min.

Optionally, the first duration threshold may be set by the user, illustratively, if the original data is more, the first duration threshold may take a larger value to ensure the quality of the training data, otherwise, the first duration threshold may take a smaller value to ensure the quantity of the training data.

Alternatively, the chapter information may be obtained from meta (meta) information in the 3D movie resource, where meta is located at the head of the document and is an auxiliary tag for defining a name associated with the document, such as a keyword. In this embodiment, the information that the meta may provide includes: chapter title.

2. And sampling the rest chapters without the preset chapters, and storing sampling results in a left and right image form.

In the process of sampling the rest chapters, for chapters with overlong part of time, more image frames may be extracted from the chapters by consistent and uniform sampling, the image frames in the same chapter are often similar, if more similar image frames are acquired, data bias may be introduced, so that accuracy of subsequent model training is affected, and in order to ensure that the number of data of chapters and scenes is approximately the same, in one possible implementation, a second duration threshold and at least two sampling frequencies are set, for each chapter, sampling is performed according to a higher frequency within a preset time period (within the second duration threshold), and sampling is performed according to a lower frequency within the rest time period.

Illustratively, the second duration threshold may be T₂Denotes that the sampling frequency is f₁And f₂. For each chapter, top T₂Within time according to f₁Sampling is carried out according to f in the residual time period₂Sampling is performed.

Alternatively, f may be set to ensure that each chapter can be preferentially sampled for a sufficient number of image frames₁<f₂。

Optionally, the second duration threshold may also be determined based on the total duration of different chapters, for example, the total duration of chapters is longer, and the second duration threshold may take a smaller value; or based on how much of the raw data is, for example, the angle of the raw data, the second duration threshold may take a larger value.

Because the original 3D movie resource is stored in a left-eye and right-eye format, that is, includes a left-eye video and a right-eye video, in order to ensure that a left-eye image and a right-eye image corresponding to the same time can be acquired, the left-eye and right-eye videos need to be sampled at the same sampling frequency.

Step 203, preprocessing the candidate image pair to obtain an original image pair.

The preprocessing can include removing a blurred frame generated in the sampling process and removing a black border area.

Aiming at the mode of removing the fuzzy frame, the image definition corresponding to each candidate left and right target image can be calculated by adopting an image quality evaluation algorithm, and the candidate image pair with the definition lower than a preset definition threshold value is removed. Optionally, the image quality evaluation algorithm may include: laplacian (Laplacian) operator, Brenner (Brenner) gradient function, Tenengrad gradient function, and the like.

For the way of removing the black edge area, a clipping value d can be set₁And d₂And cutting the extracted candidate image pair based on the cutting value, removing the edge area in the candidate image pair, and only reserving the pure picture area of the image center area.

Alternatively, different clipping values may be set for different 3D movies based on the size of the black-edge area in the image frame.

Schematically, as shown in fig. 3, it is a schematic diagram of cropping an image frame. Wherein the image frame 301 comprises a picture area 302 and a black edge area 303, and the clipping value d is set based on the size of the black edge area 303 in the image frame 301₁And d₂And based on d₁And d₂The image frame 301 is clipped, the black edge area 303 in the image frame 301 is removed, and only the picture area 302 is reserved.

In one possible embodiment, the above pre-processing is performed on candidate image pairs extracted from the 3D video resource, and the processed candidate image pairs are determined as original image pairs for the subsequent optical flow extraction and parallax data generation.

And 204, carrying out optical flow extraction on the original image pairs to obtain optical flow values corresponding to each group of original image pairs.

The implementation of step 204 may refer to step 102, and this embodiment is not described herein.

Step 205, screening the original image pair based on the vertical optical flow threshold to obtain a first image pair, where the probability that the vertical optical flow value corresponding to the first image pair is greater than the vertical optical flow threshold is lower than a first preset probability threshold.

The optical flow values extracted based on the left and right eye images have the characteristic that the optical flow value in the vertical direction is nearly 0, and correspondingly, in a possible implementation mode, the original image pairs are screened under the condition that the optical flow value in the vertical direction is nearly 0, and the original image pairs which do not meet the condition of the vertical optical flow value are removed.

Illustratively, the vertical optical flow threshold is 0.

In a possible implementation manner, when the original image pair is screened based on the vertical optical flow threshold, the vertical optical flow values of the pixels (positions) corresponding to the original left and right eye images can be obtained, the number of the pixels of which the vertical optical flow values are lower than the vertical optical flow threshold is counted, the proportion of the points of which the values are lower than the numerical optical flow threshold in the vertical optical flow values corresponding to the original left and right eye images is calculated based on the number and the total number of the pixels, and whether the optical flow values corresponding to the original left and right eye images meet the condition for extracting the parallax data subsequently is determined based on the relation between the proportion and the first preset probability threshold. If the ratio is higher than a first preset probability threshold value, the prediction accuracy of the optical flow value corresponding to the original sample image is low, and the optical flow value needs to be removed from the original image pair; on the contrary, if the ratio is lower than the first preset probability threshold, it indicates that the optical flow prediction accuracy corresponding to the original left and right target images is high, and subsequent optical flow screening or parallax data extraction can be performed, and the original image pair is retained.

Correspondingly, the vertical optical flow value condition is: the probability that the vertical optical flow value is greater than the vertical optical flow threshold is lower than a first preset probability threshold. Illustratively, the vertical optical flow threshold may be 0 and the first preset probability threshold may be 0.05.

Optionally, because of the difference in the optical flow extraction directions, a group of original image pairs have a forward optical flow and a backward optical flow, and correspondingly, both the forward optical flow and the backward optical flow have vertical optical flow values, and correspondingly, in order to improve the accuracy of the determination, in a possible implementation, it is necessary to determine the vertical optical flow in the forward optical flow and the backward optical flow, so as to comprehensively determine whether to eliminate the original image pair from the sample data set.

Illustratively, the first predetermined probability threshold is P₁The vertical optical flow threshold is 0, and correspondingly, the vertical optical flow condition can be expressed as: f_fv(representing vertical optical flow component in forward optical flow) the proportion of points with the value not 0 is less than P₁And/or F_bv(representing vertical optical flow component in backward optical flow) the proportion of points with the value not equal to 0 is less than P₁。

And step 206, screening the first image pair based on the horizontal optical flow threshold value to obtain a second image pair, wherein the maximum horizontal optical flow difference value corresponding to the second image pair is greater than the horizontal optical flow threshold value.

Based on the light stream characteristic that the horizontal direction has a large-amplitude light stream, a horizontal light stream value screening condition is set: the maximum horizontal optical flow difference value is greater than the horizontal optical flow threshold value. Illustratively, the horizontal optical flow threshold may be 2.

In one possible embodiment, the horizontal optical flow value corresponding to the first image pair is obtained, and the maximum horizontal optical flow value and the minimum horizontal optical flow value are determined therefrom, so that the maximum horizontal optical flow difference value is calculated based on the maximum horizontal optical flow value and the minimum horizontal optical flow value, so as to determine whether to eliminate the first image pair from the sample data set by comparing the relationship between the maximum horizontal optical flow difference value and the horizontal optical flow threshold value. Illustratively, if the maximum horizontal optical flow difference value is greater than the horizontal optical flow threshold value, it indicates that the image pair has an optical flow value with a larger magnitude in the horizontal direction, and the horizontal optical flow condition is satisfied, and the first image pair is determined as the second image pair; conversely, if the maximum horizontal optical flow difference is less than the horizontal optical flow threshold, it indicates that the image pair does not have an optical flow value of a larger magnitude in the horizontal direction, and in order to avoid the influence on the subsequent parallax data extraction, the first image pair is removed from the data set.

Illustratively, the horizontal optical flow condition may be expressed as: f_fu(representing the horizontal optical flow component in the forward optical flow) the difference between the maximum and minimum values is greater than the horizontal optical flow threshold, F_bu(representing the horizontal optical flow component in the backward optical flow) the difference between the maximum and minimum values is greater than the horizontal optical flow threshold.

And step 207, screening the second image pair based on the first pixel threshold value to obtain a third image pair, wherein the probability that the pixel value difference value of the same pixel point between the third image pair and the predicted image pair is greater than the first pixel threshold value is lower than a second preset probability threshold value.

If the optical flow value is extracted accurately, the original left and right eye images are subjected to optical flow mapping respectively, the obtained predicted left and right eye images are almost consistent with the original left and right eye images, and in order to determine the accuracy of the extracted optical flow value, in one possible implementation, whether to remove the image pair from the data set is determined by comparing the relation between the value at the same position (at the same pixel point) in the original left and right eye images and the predicted left and right eye images and the first pixel threshold.

The process of obtaining the third image pair based on the first pixel threshold value screening may include the following steps:

and firstly, carrying out optical flow mapping processing on the original left eye image in the second image pair to obtain a predicted right eye image in the predicted image pair.

In one possible embodiment, the original left eye image in the second image pair is subjected to an optical flow mapping process, i.e. a predicted right eye image is obtained based on the forward optical flow and the original left eye image.

Illustratively, if the original pixel coordinate is (x)₁，y₁) The pixel coordinate corresponds to a light stream value of (x)₂，y₂) After optical flow mapping, the pixel coordinate in the obtained predicted right eye image is (x)₁+x₂，y₁+y₂)。

And secondly, performing optical flow mapping processing on the original right eye image in the second image pair to obtain a predicted left eye image in the predicted image pair.

In one possible embodiment, the original right eye image in the second image pair is subjected to an optical flow mapping process, i.e. a predicted left eye image may be derived based on the backward optical flow and the original right eye image.

And thirdly, determining the second image pair as a third image pair in response to the fact that the probability that the pixel value difference value corresponding to the same pixel point in the predicted right eye image and the original right eye image is larger than the first pixel threshold value is lower than a second preset probability threshold value, and/or in response to the fact that the probability that the pixel value difference value corresponding to the same pixel point in the predicted left eye image and the original left eye image is larger than the first pixel threshold value is lower than the second preset probability threshold value.

Based on the characteristic that the optical flow extraction precision is high and the corresponding predicted left and right eye images are about the same as the pixel values of the original left and right eye images at the pixel positions, in one possible embodiment, a first pixel threshold value is set, and whether the image pair is removed from the data set is determined by comparing the relation between the difference value of the pixel values of the same positions in the predicted left eye image and the original left eye image and the first pixel threshold value. Illustratively, the first pixel threshold may be 0.

Optionally, in order to improve the accuracy of optical flow screening, in a possible implementation, a second preset probability threshold is set, and in the optical flow screening process based on the first pixel threshold, the number of points, at which a difference between pixel values corresponding to the same pixel position in the predicted right-eye image and the original right-eye image is greater than the first pixel threshold, and the total number of pixels may be counted, and the probability of the points, at which the difference between the pixel values is greater than the first pixel threshold, is calculated and compared with the second preset probability, if the difference is lower than the second preset probability threshold, it is indicated that optical flow extraction is accurate, and the optical flow extraction may be used for subsequent parallax data extraction, otherwise, if the difference is higher than the second preset probability threshold, it is indicated that optical flow extraction accuracy is low, and an image pair corresponding to the original right-eye image is deleted from the data set.

Optionally, if it is determined that the original right-eye image and the predicted right-eye image do not satisfy the optical flow accuracy condition, the images corresponding to the original right-eye image can be directly eliminated; or determining that the original left eye image and the predicted left eye image do not meet the optical flow accuracy condition, or directly removing the image pair corresponding to the original left eye image; or determining that the original left and right eye images and the predicted left and right eye images do not meet the optical flow accuracy condition, and rejecting the image pair.

Illustratively, the optical flow accuracy condition is: and the proportion of the points with inconsistent values of IL and IL' is less than a second preset probability threshold. Where IL represents the original left eye image and IL' represents the original right eye image.

Illustratively, the second preset probability threshold may be 3%.

It should be noted that step 205, step 206, and step 207 may be performed in the order shown in the above embodiments, or may be performed in the order of step 206, step 205, and step 207, and the order of optical flow filtering is not limited in this embodiment.

And step 208, determining the left eye image or the right eye image in the third image pair as a target monocular image, and determining a horizontal light flow value corresponding to the target monocular image as a target light flow value.

In a possible implementation manner, after the optical flow screening, a target optical flow value that meets the disparity information relationship and a target image pair (a third image pair) corresponding to the target optical flow value may be obtained, and since the training sample set for monocular depth estimation is a monocular image, correspondingly, an original left eye image or an original right eye image in the third image pair may be directly determined as the target monocular image, and a horizontal optical flow value corresponding to the target monocular image may be determined as the target optical flow value at the same time.

Illustratively, if the target monocular image is the original left eye image, the corresponding target optical flow value is the horizontal optical flow value in the forward optical flow, and if the target monocular image is the original right eye image, the corresponding target optical flow value is the horizontal optical flow value in the backward optical flow. The forward optical flow and the backward optical flow are extracted with the original left eye image as a reference frame.

And 209, performing negation operation on the target horizontal optical flow value to obtain candidate parallax data corresponding to the target monocular image.

Wherein the target horizontal optical flow value is a horizontal optical flow value corresponding to the forward optical flow.

Because the forward optical flow is extracted by taking the left eye image as a reference frame, correspondingly, when the parallax data is generated based on the target horizontal optical flow value, firstly, the negation operation needs to be carried out on the target horizontal optical flow value, so that the candidate parallax data corresponding to the target monocular image is obtained.

Illustratively, the formula for inverting the target horizontal light flow value may be expressed as:

wherein the content of the first and second substances,

representing candidate disparity data before normalization, F_luIndicating the horizontal light flow value corresponding to the left eye image.

And step 210, performing normalization operation on the candidate parallax data to obtain target parallax data corresponding to the target monocular image.

Because the relative scales of the original graph adopted in the optical flow extraction process may have differences, in order to solve the problem of inconsistent scales in the training process, in a possible implementation manner, normalization processing needs to be performed on the candidate parallax data, so as to obtain target parallax data which can be used as a training sample.

In an illustrative example, the process of normalizing the candidate disparity data can include any one of the following methods.

Firstly, target parallax data corresponding to the target monocular image are calculated and obtained based on the candidate parallax data, the maximum parallax data and the minimum parallax data.

In a possible implementation manner, a scaling method may be used to perform normalization processing on the candidate parallax data, and correspondingly, maximum parallax data and minimum parallax data corresponding to the candidate parallax data need to be obtained, and then target parallax data corresponding to the target monocular image is obtained through calculation based on the candidate parallax data, the maximum parallax data and the minimum parallax data.

The maximum parallax data is the maximum value in the candidate parallax data corresponding to the monocular image, and the minimum parallax data is the minimum value in the candidate parallax data corresponding to the target monocular image.

Illustratively, the formula for normalizing the candidate disparity data based on the scaling method can be expressed as:

wherein the content of the first and second substances,

represents the candidate disparity data before normalization, min (x) represents the minimum value, max (x) represents the maximum value operation, and DL represents the disparity data after normalization.

And secondly, calculating to obtain target parallax data corresponding to the target monocular image based on the candidate parallax data and the median of the candidate parallax data.

In other possible embodiments, normalization processing may be performed by a method of taking a median of the candidate parallax data, that is, obtaining the median of the candidate parallax data, and calculating target parallax data corresponding to the target monocular image based on the candidate parallax data and the median.

In one illustrative example, the formula for the normalization process may be:

wherein m represents the median of each candidate parallax data in the left eye image, mean (x) represents the median taking operation, x represents the cyclic variable,

represents the candidate disparity data before normalization and DL represents the disparity data after normalization.

In one possible implementation, the target monocular image and the disparity data corresponding to the target monocular image may be stored in association for subsequent use as a training sample for monocular depth estimation.

In this embodiment, the original image pair is screened based on a preset optical flow condition, so that a target optical flow value obtained after screening satisfies: the optical flow in the vertical direction is almost 0 everywhere, and the horizontal direction has the conditions of large amplitude of optical flow, high optical flow precision and the like, so that the accuracy of the extracted optical flow value is ensured, and the accuracy of parallax data is improved.

In this embodiment, the obtained parallax data is mainly used in a pre-training process of the monocular depth estimation model, and in order to avoid that the accuracy of the parallax data is low and the prediction accuracy of the monocular depth estimation model is affected, binarization processing is performed on the target monocular image to indicate the precision of the value of the parallax data corresponding to the pixel point, so that differential learning can be performed based on the precision in the model training process.

As shown in fig. 4, which shows a flowchart of a parallax data generating method provided in another exemplary embodiment of the present application, an embodiment of the present application takes an example in which the method is applied to a computer device, and the method includes:

step 401, at least one group of original image pairs is obtained, where the original image pairs include an original left eye image and an original right eye image, and the original left eye image and the original right eye image are images corresponding to the same scene observed under a left eye viewing angle and a right eye viewing angle.

Step 402, performing optical flow extraction on the original image pairs to obtain optical flow values corresponding to each group of original image pairs.

And 403, performing optical flow screening on the original image pair based on the optical flow values to obtain the target monocular image and the target optical flow values corresponding to the pixel points in the target monocular image.

And step 404, generating target parallax data corresponding to the target monocular image based on the target optical flow value.

The embodiments of step 401 to step 404 may refer to the above embodiments, which are not described herein.

And 405, performing binarization processing on the target monocular image to obtain a target segmentation image corresponding to the target monocular image.

In order to improve the accuracy of model training, in a possible implementation manner, a target monocular image is subjected to binarization processing to obtain a target segmented image, and binarization confidence levels (accuracies) corresponding to each pixel point are marked in the target segmented image, so that whether the parallax information at the pixel point needs to be learned or not can be determined based on the binarization confidence levels in the model training process.

In an exemplary example, the process of performing binarization processing on the target monocular image may include the steps of:

the method comprises the steps of firstly, obtaining a target mapping image corresponding to a target monocular image, wherein the target mapping image is obtained by carrying out optical flow mapping processing on the target monocular image.

Since the accuracy of the parallax data depends on the accuracy of the optical flow values, that is, if the accuracy of the parallax data needs to be evaluated, the accuracy of the parallax data can be indirectly evaluated by evaluating the accuracy of the optical flow values, which can be achieved by comparing the difference between the optical flow mapped image and the original image, and correspondingly, in a possible embodiment, the optical flow mapping is first performed on the target monocular image based on the target optical flow values to obtain the target mapping image for subsequent use in comparing the difference between the target mapping image and the target monocular image.

The optical flow mapping process may refer to the above embodiments, which are not described herein again.

Optionally, if the target monocular image is a target left eye image, correspondingly, optical flow mapping needs to be performed on the target right eye image based on the target optical flow value, so as to obtain a target mapping image corresponding to the target left eye image; on the contrary, if the target monocular image is the target right-eye image, the optical flow mapping needs to be performed on the target left-eye image based on the target optical flow value, so as to obtain the target mapping image corresponding to the target right-eye image.

And secondly, setting the pixel value of the pixel point in the target segmentation image as a first pixel value in response to the fact that the difference value of the pixel values corresponding to the same pixel point in the target monocular image and the target mapping image is smaller than a second pixel threshold value.

Based on the characteristics that the optical flow value is accurate and the corresponding target mapping image obtained after optical flow mapping should be the same as the target monocular image, in a possible implementation manner, whether the pixel values at the same pixel position are the same or not is compared, that is, the difference value of the pixel values corresponding to the same pixel point is smaller than the second pixel threshold value, so that the difference value is used as a basis for evaluating the accuracy of parallax data corresponding to the pixel point. Illustratively, the second pixel threshold may be 0.

In order to distinguish points with the same pixel value and different pixel values, the pixel value corresponding to the point which satisfies the condition that the difference value of the pixel values is smaller than the second pixel threshold value is set as the first pixel value, and the pixel value corresponding to the point which does not satisfy the condition that the difference value of the pixel values is smaller than the second pixel threshold value is set as the second pixel value.

Illustratively, the first pixel value may be 1 and the second pixel value may be 0.

In a possible implementation manner, pixel values corresponding to the same pixel point in the target monocular image and the target mapping image are obtained, a difference value between the two pixel values is calculated, if the difference value is smaller than a second pixel threshold value, it indicates that the accuracy of the parallax data at the pixel point is higher, and the corresponding pixel value is set to 1.

And thirdly, in response to the fact that the difference value of the pixel values corresponding to the same pixel point in the target monocular image and the target mapping image is larger than a second pixel threshold value, setting the pixel value of the pixel point in the target segmentation image as the second pixel value.

In a possible implementation manner, pixel values corresponding to the same pixel point in the target monocular image and the target mapping image are obtained, a difference value between the two pixel values is calculated, if the difference value is greater than a second pixel threshold value, it indicates that the accuracy of parallax data at the pixel point is low, and the corresponding pixel value is set to 0.

In an exemplary example, the process of binarizing the target monocular image may be represented as:

wherein, IL [ i, j ] represents a pixel value corresponding to a certain pixel point in the target monocular image (target left eye image), IL' [ i, j ] represents a pixel value corresponding to the pixel point at the same position in the target mapping image, T is a second pixel threshold value, and mask [ i, j ] represents a binarization value corresponding to the pixel point.

And step 406, determining the target monocular image, the target segmentation image and the target parallax data as a training sample set corresponding to the monocular depth estimation model.

In a possible application scenario, a target monocular image, a target segmentation image obtained after binarization processing of the target monocular image, and target parallax data corresponding to the target monocular image are stored in an associated manner so as to serve as a training sample set of a subsequent training monocular depth estimation model.

In this embodiment, by performing binarization processing on the target monocular image, in a subsequent model training process, whether disparity data corresponding to the pixel point is accurate or not can be determined, so that the model can perform targeted learning on correct data in the training process, and accuracy of model training is further improved.

Referring to fig. 5, a schematic diagram of a process for acquiring disparity data according to an exemplary embodiment of the present application is shown. The present embodiment is exemplified by obtaining left and right eye images and corresponding disparity information in batch based on 3D movie resources. Performing video frame extraction on the 3D movie 501 to obtain a candidate left eye image 502A and a candidate right eye image 502B which are paired, and performing preprocessing (the preprocessing comprises removing a black edge area and removing a blurred frame) on the candidate left eye image 502A and the candidate right eye image 502B to obtain an original left eye image 503A and an original right eye image 503B; and then obtaining a forward optical flow 504A and a backward optical flow 504B through optical flow extraction, obtaining parallax data 505 corresponding to a target left eye image 506 through optical flow screening and parallax data extraction, and storing the target left eye image 506 and the parallax data 505 in a correlation mode to form a sample data pair.

The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.

Referring to fig. 6, a block diagram of a disparity data generating apparatus according to an embodiment of the present application is shown. The device has the function of realizing the execution of the computer equipment in the method embodiment, and the function can be realized by hardware or by hardware executing corresponding software. As shown in fig. 6, the apparatus may include:

an obtaining module 601, configured to obtain at least one group of original image pairs, where the original image pair includes an original left eye image and an original right eye image, and the original left eye image and the original right eye image are images corresponding to a same scene observed under a left eye viewing angle and a right eye viewing angle;

an optical flow extraction module 602, configured to perform optical flow extraction on the original image pairs to obtain optical flow values corresponding to each group of the original image pairs;

an optical flow screening module 603, configured to perform optical flow screening on the original image pair based on the optical flow value to obtain a target monocular image and a target optical flow value corresponding to each pixel point in the target monocular image, where a relative size relationship between the target optical flow values corresponding to each pixel point after the optical flow screening is the same as a relative size relationship between the parallax data, and the target monocular image is an original left-eye image or an original right-eye image;

a generating module 604, configured to generate target parallax data corresponding to the target monocular image based on the target optical flow value.

Optionally, the optical flow values comprise horizontal optical flow values and vertical optical flow values;

the optical flow filtering module 603 includes:

the first screening unit is used for screening the original image pair based on a vertical optical flow threshold value to obtain a first image pair, and the probability that the vertical optical flow value corresponding to the first image pair is greater than the vertical optical flow threshold value is lower than a first preset probability threshold value;

the second screening unit is used for screening the first image pair based on a horizontal optical flow threshold value to obtain a second image pair, wherein the maximum horizontal optical flow difference value corresponding to the second image pair is larger than the horizontal optical flow threshold value;

the third screening unit is used for screening the second image pair based on a first pixel threshold value to obtain a third image pair, wherein the probability that the pixel value difference value of the same pixel point between the third image pair and a predicted image pair is larger than the first pixel threshold value is lower than a second preset probability threshold value, and the predicted image is obtained by carrying out optical flow mapping on the third image pair;

and the determining unit is used for determining the left eye image or the right eye image in the third image pair as the target monocular image and determining the horizontal light flow value corresponding to the target monocular image as the target light flow value.

Optionally, the third screening unit is further configured to:

performing optical flow mapping processing on an original left eye image in the second image pair to obtain a predicted right eye image in the predicted image pair;

performing optical flow mapping processing on an original right eye image in the second image pair to obtain a predicted left eye image in the predicted image pair;

and determining the second image pair as the third image pair in response to the probability that the pixel value difference value corresponding to the same pixel point in the predicted right eye image and the original right eye image is greater than the first pixel threshold value and/or in response to the probability that the pixel value difference value corresponding to the same pixel point in the predicted left eye image and the original left eye image is greater than the first pixel threshold value and/or is less than the second preset probability threshold value.

Optionally, the target monocular image is the original left eye image, and the target optical flow value is a horizontal optical flow value;

the generating module 604 includes:

the negation unit is used for performing negation operation on the target horizontal optical flow value to obtain candidate parallax data corresponding to the target monocular image;

and the normalization unit is used for performing normalization operation on the candidate parallax data to obtain target parallax data corresponding to the target monocular image.

Optionally, the normalization unit is further configured to:

calculating to obtain the target parallax data corresponding to the target monocular image based on the candidate parallax data, the maximum parallax data and the minimum parallax data;

or the like, or, alternatively,

and calculating to obtain the target parallax data corresponding to the target monocular image based on the candidate parallax data and the median of the candidate parallax data.

Optionally, the apparatus further comprises:

the binarization processing module is used for carrying out binarization processing on the target monocular image to obtain a target segmentation image corresponding to the target monocular image;

and the determining module is used for determining the target monocular image, the target segmentation image and the target parallax data as a training sample set corresponding to the monocular depth estimation model.

Optionally, the binarization processing module includes:

a first obtaining unit, configured to obtain a target mapping image corresponding to the target monocular image, where the target mapping image is obtained by performing optical flow mapping processing on the target monocular image;

the first setting unit is used for setting the pixel value of the pixel point in the target segmentation image as a first pixel value in response to the fact that the difference value of the pixel values corresponding to the same pixel point in the target monocular image and the target mapping image is smaller than a second pixel threshold value;

and the second setting unit is used for setting the pixel value of the pixel point in the target segmentation image as a second pixel value in response to the difference value of the corresponding pixel values of the same pixel point in the target monocular image and the target mapping image being greater than the second pixel threshold value.

Optionally, the obtaining module 601 includes:

the second acquisition unit is used for acquiring a target video in a target storage format, wherein the target storage format at least comprises a left storage format and a right storage format;

the extraction unit is used for extracting at least one group of candidate image pairs from the target video, wherein the candidate image pairs comprise a candidate left-eye image and a candidate right-eye image;

and the preprocessing unit is used for preprocessing the candidate image pair to obtain the original image pair.

It should be noted that: in the embodiments, when the functions of the parallax data generating apparatus are implemented, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the functions described above. In addition, the disparity data generating apparatus and the disparity data generating method provided in the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments in detail, and are not described herein again.

Referring to fig. 7, a block diagram of a computer device according to an embodiment of the present application is shown. The computer device may be used to implement the disparity data generating method performed by the computer device in the above-described embodiments.

Specifically, the method comprises the following steps:

the computer device 700 includes a Central Processing Unit (CPU) 701, a system Memory 704 including a Random Access Memory (RAM) 702 and a Read-Only Memory (ROM) 703, and a system bus 705 connecting the system Memory 704 and the CPU 701. The computer device 700 also includes a basic Input/Output system (I/O system) 706 that helps to transfer information between various devices within the server, and a mass storage device 707 for storing an operating system 713, application programs 714, and other program modules 715.

The basic input/output system 706 comprises a display 708 for displaying information and an input device 709, such as a mouse, keyboard, etc., for a user to input information. Wherein the display 708 and input device 709 are connected to the central processing unit 701 through an input output controller 710 coupled to the system bus 705. The basic input/output system 706 may also include an input/output controller 710 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, input-output controller 710 may also provide output to a display screen, a printer, or other type of output device.

The mass storage device 707 is connected to the central processing unit 701 through a mass storage controller (not shown) connected to the system bus 705. The mass storage device 707 and its associated computer-readable storage media provide non-volatile storage for the computer device 700. That is, the mass storage device 707 may include a computer-readable storage medium (not shown) such as a hard disk or Compact Disc-Only Memory (CD-ROM) drive.

Without loss of generality, the computer-readable storage media may include computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable storage instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), flash Memory or other solid state Memory technology, CD-ROM, Digital Versatile Disks (DVD), or other optical, magnetic, or other magnetic storage devices. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the foregoing. The system memory 704 and mass storage device 707 described above may be collectively referred to as memory.

The memory stores one or more programs configured to be executed by the one or more central processing units 701, the one or more programs containing instructions for implementing the above-described method embodiments, and the central processing unit 701 executes the one or more programs to implement the parallax data generating methods provided by the above-described respective method embodiments.

According to various embodiments of the present application, the computer device 700 may also operate as a remote server connected to a network via a network, such as the Internet. That is, the computer device 700 may be connected to the network 712 through the network interface unit 711 connected to the system bus 705, or may be connected to other types of networks or remote server systems (not shown) using the network interface unit 711.

The memory also includes one or more programs, stored in the memory, that include instructions for performing the steps performed by the computer device in the methods provided by the embodiments of the present application.

The present embodiments also provide a computer-readable storage medium, which stores at least one program code, and the program code is loaded and executed by a processor to implement the parallax data generating method according to the above embodiments.

According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the disparity data generating method provided in the various alternative implementations of the above-described aspects.

It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. In addition, the step numbers described herein only exemplarily show one possible execution sequence among the steps, and in some other embodiments, the steps may also be executed out of the numbering sequence, for example, two steps with different numbers are executed simultaneously, or two steps with different numbers are executed in a reverse order to the order shown in the figure, which is not limited by the embodiment of the present application.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of disparity data generation, the method comprising:

2. The method of claim 1, wherein the optical flow values comprise horizontal optical flow values and vertical optical flow values;

the optical flow screening is performed on the original image pair based on the optical flow value to obtain a target monocular image and a target optical flow value corresponding to each pixel point in the target monocular image, and the optical flow screening comprises the following steps:

screening the original image pair based on a vertical optical flow threshold value to obtain a first image pair, wherein the probability that the vertical optical flow value corresponding to the first image pair is greater than the vertical optical flow threshold value is lower than a first preset probability threshold value;

screening the first image pair based on a horizontal optical flow threshold value to obtain a second image pair, wherein the maximum horizontal optical flow difference value corresponding to the second image pair is greater than the horizontal optical flow threshold value;

screening the second image pair based on a first pixel threshold value to obtain a third image pair, wherein the probability that the pixel value difference value of the same pixel point between the third image pair and a predicted image pair is greater than the first pixel threshold value is lower than a second preset probability threshold value, and the predicted image is obtained by carrying out optical flow mapping on the third image pair;

and determining a left eye image or a right eye image in the third image pair as the target monocular image, and determining a horizontal light flow value corresponding to the target monocular image as the target light flow value.

3. The method of claim 2, wherein the screening the second image pair based on a pixel threshold to obtain a third image pair comprises:

4. The method according to any one of claims 1 to 3, wherein the target monocular image is the original left eye image, and the target luminous flux value is a horizontal luminous flux value;

generating target parallax data corresponding to the target monocular image based on the target optical flow value includes:

performing negation operation on the target horizontal optical flow value to obtain candidate parallax data corresponding to the target monocular image;

and carrying out normalization operation on the candidate parallax data to obtain target parallax data corresponding to the target monocular image.

5. The method according to claim 4, wherein the normalizing the candidate parallax data to obtain the target parallax data corresponding to the target monocular image comprises at least one of:

or the like, or, alternatively,

6. The method according to any one of claims 1 to 3, wherein after generating target parallax data corresponding to the target monocular image based on the target optical flow value, the method further comprises:

carrying out binarization processing on the target monocular image to obtain a target segmentation image corresponding to the target monocular image;

and determining the target monocular image, the target segmentation image and the target parallax data as a training sample set corresponding to the monocular depth estimation model.

7. The method according to claim 6, wherein the binarizing the target monocular image to obtain a target segmentation image corresponding to the target monocular image comprises:

acquiring a target mapping image corresponding to the target monocular image, wherein the target mapping image is obtained by performing optical flow mapping processing on the target monocular image;

setting the pixel value of the pixel point in the target segmentation image as a first pixel value in response to the difference value of the pixel values corresponding to the same pixel point in the target monocular image and the target mapping image being smaller than a second pixel threshold value;

and setting the pixel value of the pixel point in the target segmentation image as a second pixel value in response to the difference value of the pixel values corresponding to the same pixel point in the target monocular image and the target mapping image being larger than the second pixel threshold value.

8. The method of any of claims 1 to 3, wherein said obtaining at least one set of raw image pairs comprises:

acquiring a target video in a target storage format, wherein the target storage format at least comprises a left storage format and a right storage format;

extracting at least one group of candidate image pairs from the target video, wherein the candidate image pairs comprise a candidate left image and a candidate right image;

and preprocessing the candidate image pair to obtain the original image pair.

9. A disparity data generating apparatus, comprising:

10. A computer device comprising a processor and a memory, the memory having stored therein at least one program code, the program code being loaded and executed by the processor to implement the disparity data generating method according to any of claims 1 to 8.

11. A computer-readable storage medium having stored therein at least one program code, the program code being loaded and executed by a processor to implement the disparity data generating method according to any one of claims 1 to 8.