CN112991419B

CN112991419B - Parallax data generation method, parallax data generation device, computer equipment and storage medium

Info

Publication number: CN112991419B
Application number: CN202110258449.8A
Authority: CN
Inventors: 尹康
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-03-09
Filing date: 2021-03-09
Publication date: 2023-11-14
Anticipated expiration: 2041-03-09
Also published as: CN112991419A

Abstract

The application discloses a parallax data generation method, a parallax data generation device, computer equipment and a storage medium, and belongs to the technical field of computer vision. The method comprises the following steps: acquiring at least one group of original image pairs, wherein the original image pairs comprise original left-eye images and original right-eye images; performing optical flow extraction on the original image pairs to obtain optical flow values corresponding to each group of original image pairs; performing optical flow screening on the original image pair based on the optical flow value to obtain a target monocular image and target optical flow values corresponding to each pixel point in the target monocular image, wherein the target monocular image is an original left-eye image or an original right-eye image; and generating target parallax data corresponding to the target monocular image based on the target light value. The parallax data acquisition efficiency is improved, a large number of pre-training samples for training the monocular depth estimation model can be quickly constructed based on the parallax data, and the training efficiency of the depth estimation model can be improved.

Description

Parallax data generation method, parallax data generation device, computer equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of computer vision, in particular to a parallax data generation method, a parallax data generation device, computer equipment and a storage medium.

Background

Image-based depth information estimation is one of the fundamental tasks in the field of computer vision, and has wide application in the fields of 3-dimensional reconstruction, augmented reality (Augmented Reality, AR) interaction and the like. Based on the difference of the number of images used for depth information estimation, the correlation algorithm of depth estimation can be divided into two major categories, namely multi-view image depth estimation and monocular image depth estimation. The monocular depth estimation algorithm has wider application prospect, and in the field of monocular image depth estimation, the depth information estimation is mainly realized by training a network model.

In the related art, when training a network model for depth information estimation, a sufficient number of training samples are often required to be obtained, and the required training samples are in the form of: the monocular image and the matched single-channel depth spectrum, and the value of each position on the depth spectrum represents the distance between the corresponding position of the original image and the camera. A general acquisition method is to acquire an image by a special sensor to acquire an image with depth information. However, the acquisition method has high acquisition cost and low acquisition efficiency, and cannot be used for rapidly acquiring the image with the depth information.

Disclosure of Invention

The embodiment of the application provides a parallax data generation method, a parallax data generation device, computer equipment and a storage medium. The technical scheme is as follows:

in one aspect, an embodiment of the present application provides a parallax data generating method, including:

acquiring at least one group of original image pairs, wherein the original image pairs comprise original left-eye images and original right-eye images, and the original left-eye images and the original right-eye images are images corresponding to the same scene observed under a left-eye visual angle and a right-eye visual angle;

performing optical flow extraction on the original image pair to obtain optical flow values corresponding to each group of original images;

performing optical flow screening on the original image pair based on the optical flow value to obtain a target monocular image and target optical flow values corresponding to all pixel points in the target monocular image, wherein the relative size relation between the target optical flow values corresponding to all pixel points after optical flow screening is the same as the relative size relation between parallax data, and the target monocular image is an original left-eye image or an original right-eye image;

and generating target parallax data corresponding to the target monocular image based on the target light value.

In another aspect, an embodiment of the present application provides a parallax data generating apparatus, including:

the acquisition module is used for acquiring at least one group of original image pairs, wherein the original image pairs comprise original left-eye images and original right-eye images, and the original left-eye images and the original right-eye images are images corresponding to the same scene observed under a left-eye visual angle and a right-eye visual angle;

the optical flow extraction module is used for extracting optical flow of the original image pair to obtain optical flow values corresponding to the original images of each group;

the optical flow screening module is used for carrying out optical flow screening on the original image pair based on the optical flow value to obtain a target monocular image and target optical flow values corresponding to all pixel points in the target monocular image, wherein the relative size relation between the target optical flow values corresponding to all pixel points after optical flow screening is the same as the relative size relation between parallax data, and the target monocular image is an original left-eye image or an original right-eye image;

and the generation module is used for generating target parallax data corresponding to the target monocular image based on the target light value.

In another aspect, an embodiment of the present application provides a computer device, where the computer device includes a processor and a memory, where at least one program code is stored in the memory, and the program code is loaded and executed by the processor to implement the parallax data generating method according to the above aspect.

In another aspect, embodiments of the present application provide a computer-readable storage medium having at least one program code stored therein, the program code being loaded and executed by a processor to implement the parallax data generating method as described in the above aspect.

In another aspect, embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions so that the computer device performs the parallax data generation method provided in the various alternative implementations of the above aspect.

The technical scheme provided by the embodiment of the application can bring the following beneficial effects:

in the monocular image depth estimation application scene, optical flow extraction and optical flow screening can be performed on an original left-eye image and an original right-eye image with natural parallax information, so that corresponding parallax data can be rapidly extracted based on screened optical flow values, and the parallax data acquisition efficiency is improved; because of the relation between the parallax data and the depth information, a large number of pre-training samples for training the monocular depth estimation model can be quickly constructed based on the parallax data, so that the training efficiency of the depth estimation model can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 shows a flowchart of a parallax data generation method provided by an exemplary embodiment of the present application;

fig. 2 is a flowchart illustrating a parallax data generation method according to another exemplary embodiment of the present application;

FIG. 3 is a schematic diagram of cropping an image frame;

fig. 4 is a flowchart illustrating a parallax data generation method according to another exemplary embodiment of the present application;

fig. 5 illustrates a process diagram of acquiring parallax data according to an exemplary embodiment of the present application;

fig. 6 is a block diagram showing a configuration of a parallax data generating apparatus according to an embodiment of the present application;

FIG. 7 shows a block diagram of a computer device provided by one embodiment of the application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

Referring to fig. 1, a flowchart of a parallax data generating method according to an exemplary embodiment of the present application is shown, where the embodiment of the present application is described by taking application of the method to a computer device as an example, the method includes:

step 101, at least one group of original image pairs is obtained, wherein the original image pairs comprise original left-eye images and original right-eye images, and the original left-eye images and the original right-eye images are images corresponding to the same scene observed under a left-eye visual angle and a right-eye visual angle.

The purpose of obtaining parallax data in the embodiment of the present application is to be used as a training sample set for performing deep learning training, and correspondingly, a large amount of sample images and labeled depth information often need to be obtained, based on the relationship between the parallax information and the depth information (the parallax value can be regarded as the inverse of the depth), the task of obtaining the depth information can be converted into the task of obtaining the parallax information, and in order to obtain the sample images capable of extracting the parallax information as quickly as possible, the original image pair including the original left-eye image and the original right-eye image can be used for extracting the parallax information (because the original left-eye image and the original right-eye image have natural parallax).

Alternatively, the original left-eye image and the original right-eye image may be extracted from Three-dimensional (3D) video or 3D image data stored in a left-right format, and illustratively, the original image pair may be extracted from a 3D movie resource.

Step 102, optical flow extraction is performed on the original image pairs, so as to obtain optical flow values corresponding to each group of original image pairs.

Wherein the original image pair can be subjected to optical flow extraction through an optical flow extraction network (FlowNet 2); alternatively, other optical flow calculation methods besides the optical flow extraction network may be used, such as: the optical flow extraction algorithm adopted in the embodiment of the application is not limited by the gradient-based method, the matching-based method, the energy-based method, the phase-based method and the like.

In order to quickly construct a large-scale monocular depth estimation training dataset, while considering that the constructed dataset is only used for a pre-training process in a monocular depth estimation model training process, the pre-training may not need to acquire accurate image parallax information, but only needs to ensure that the acquired parallax information meets a spatial relative relationship (that is, a near object parallax is large and a far object parallax is small), while the original left-eye image and the original right-eye image are naturally split, and optical flow values between the left-eye image and the right-eye image have a consistent relative magnitude relationship with parallax values (that is, positional parallax with large optical flow is also large), therefore, in a possible implementation, optical flow extraction can be performed on an original image pair (left-eye image and right-eye image), and corresponding parallax information of the original image pair can be determined based on the extracted optical flow values.

In one possible implementation, taking the left-eye image as a reference frame, the optical flow from the left-eye image to the right-eye image is extracted first, and may be denoted as forward optical flow, and the optical flow from the right-eye image to the left-eye image is extracted again and denoted as backward optical flow. Illustratively, the corresponding forward optical flow of the original image pair may be denoted as F _f The backward optical flow can be noted as F _b 。

Alternatively, the optical flow extraction may be performed using the right-eye image as a reference frame.

And 103, performing optical flow screening on the original image pair based on the optical flow value to obtain a target monocular image and target optical flow values corresponding to all pixel points in the target monocular image.

Since there may be an extraction error in the optical flow extraction process, and the accuracy of the optical flow extraction is related to the accuracy of the parallax data, in order to improve the accuracy of the determined parallax data, in a possible implementation manner, an image pair that does not satisfy the optical flow relationship corresponding to the left-right eye image in the original image pair needs to be removed, where the relative magnitude relationship between the corresponding target optical flow values of each pixel point after optical flow screening is the same as the relative magnitude relationship between the parallax data.

Optionally, since the acquired parallax data is a training sample set as a monocular depth estimate, the training sample set is: the monocular image and parallax data corresponding to the monocular image, and the object image pair is obtained after optical flow screening, and the original left eye image or the original right eye image in the object image pair can be directly determined as the object monocular image.

In a possible implementation manner, an optical flow screening condition is provided, and the optical flow screening condition is used for removing an error value in the predicted optical flow value, so that in the subsequent parallax information extraction process, the accuracy of the parallax extraction process can be improved.

And 104, generating target parallax data corresponding to the target monocular image based on the target light value.

In one possible embodiment, the target optical flow value is converted into target parallax data corresponding to the target monocular image based on a relationship between the target optical flow value and the parallax data.

In summary, in the embodiment of the present application, in the monocular image depth estimation application scenario, optical flow extraction and optical flow screening can be performed on the original left-eye image and the original right-eye image with natural parallax information, so that corresponding parallax data can be rapidly extracted based on the screened optical flow values, and parallax data acquisition efficiency is improved; because of the relation between the parallax data and the depth information, a large number of pre-training samples for training the monocular depth estimation model can be quickly constructed based on the parallax data, so that the training efficiency of the depth estimation model can be improved.

Since the accuracy of parallax data depends on the extracted optical flow values, it is important to ensure that the extracted optical flow values conform to the parallax relationship, and when the same scene is observed at the same time with the left and right visual angles, certain optical flow value conditions exist, such as: the optical flow in the vertical direction is almost 0 everywhere, the optical flow in the horizontal direction has larger amplitude, the optical flow precision is higher, and the like; in one possible implementation, optical flow screening may be performed based on the optical flow value condition pairs.

In an illustrative example, as shown in fig. 2, which shows a flowchart of a parallax data generating method according to another exemplary embodiment of the present application, an embodiment of the present application is described by taking the application of the method to a computer device as an example, the method includes:

step 201, obtaining a target video in a target storage format, where the target storage format at least includes a left storage format and a right storage format.

In order to quickly extract a large number of left and right eye images, in one possible implementation, a target video stored in a left and right eye format is directly used as the original data, and illustratively, the target video may be: 3D video stored in a left-right eye format, such as a 3D movie.

Optionally, the selected 3D movie data needs to meet the following conditions in addition to the conditions of storage in the left-right mesh format, so that training data is as far as possible in accordance with the actual application scene, avoiding interference to the subsequent network learning process due to the fact that animation, special effects and other exaggeration effects are not met, and selecting 3D data with fewer fragments including animation, special effects and the like, which mainly include the actual scene, as far as possible; in addition, in order to make the resolution of the training data equal to the resolution of the image in practical application, the application range of the model is prevented from being influenced due to the fact that the resolution of the training data is smaller (the training data can only be applied to processing pictures with smaller resolution), 3D data with higher resolution are selected as much as possible, and illustratively, the resolution needs to be ensured to be larger than 720P.

At step 202, at least one set of candidate image pairs is extracted from the target video, the candidate image pairs including candidate left-eye images and candidate right-eye images.

Because the selected target video is stored in a left-right eye format, at least one group of image pairs contained in the target video can be obtained by extracting video frames of the target video correspondingly.

In an illustrative example, taking an original 3D movie resource as an example, the process of extracting candidate image pairs from the 3D movie resource may include the steps of:

1. and reading chapter information in the original 3D resource, and removing a preset chapter.

Wherein, the preset chapter can be: the chapter containing no image information, for example, a chapter containing a picture of a company name, a company trademark (logo), a cast, and the like.

Since the chapter duration is too small, which is likely to be a rapid scene change, if the chapter is sampled, the acquired video frames may be blurred, which affects the accuracy of the subsequent optical flow extraction, in one possible implementation, a first time length threshold is set, and chapters with chapter duration less than the first time length threshold are removed, which may be schematically 5min.

Optionally, the first time length threshold may be set by the user, which is schematically indicated that if the original data is more, the first time length threshold may take a larger value to ensure the quality of the training data, and if not, the first time length threshold may take a smaller value to ensure the quantity of the training data.

Alternatively, chapter information may be obtained from meta (meta) information in the 3D movie resource, where meta is located at the head of the document and is an auxiliary tag that defines a name, such as a keyword, associated with the document. In this embodiment, the information that the meta may provide includes: chapter title.

2. And sampling the remaining chapters except the preset chapters, and storing the sampling result in the form of left and right eye images.

In the process of sampling the remaining chapters, the uniform sampling may cause more image frames extracted from the chapters with overlong part of time, while the image frames in the same chapter are often similar, if more similar image frames are acquired, data bias may be introduced, so that accuracy of subsequent model training is affected, in order to ensure that the data quantity of the chapters and the scenes is approximately the same, in one possible implementation, a second duration threshold and at least two sampling frequencies are set, each chapter is sampled according to a higher frequency in a pre-set time period (in the second duration threshold), and sampling is performed according to a lower frequency in the remaining time period.

Illustratively, the second duration threshold may be T ₂ Indicating a sampling frequency f ₁ And f ₂ . For each chapter, the front T ₂ According to f in time ₁ Sampling, and sampling according to f in the residual time period ₂ Sampling is performed.

Alternatively, to ensure that each chapter can be sampled preferentially to a sufficient number of image frames, f can be set ₁ <f ₂ 。

Alternatively, the second time threshold may be determined based on the total time length of different chapters, for example, the total time length of a chapter is longer, and the second time threshold may take a smaller value; or based on how much of the original data, for example, the second duration threshold may take on a larger value when the original data is at an angle.

Because the original 3D film resource is stored in a left-right eye format, namely, the original 3D film resource comprises left-eye videos and right-eye videos, in order to ensure that the left-eye images and the right-eye images corresponding to the same moment can be acquired, the left-eye videos and the right-eye videos need to be sampled according to the same sampling frequency.

Step 203, preprocessing the candidate image pair to obtain an original image pair.

The preprocessing may include removing both blurred frames generated during sampling and removing black areas.

For the mode of removing the blurred frames, an image quality evaluation algorithm can be adopted to calculate the image definition corresponding to each candidate left and right eye image, and the candidate image pairs with definition lower than a preset definition threshold value are removed. Alternatively, the image quality evaluation algorithm may include: laplace (Laplacian) operators, brenner (Brenner) gradient functions, tenengrad gradient functions, and the like.

For the mode of removing the black edge area, a clipping value d can be set ₁ And d ₂ And cutting the extracted candidate image pair based on the cutting value, removing the edge area in the candidate image pair, and only keeping the pure picture area in the central area of the image.

Alternatively, different cropping values may be set for different 3D movies based on the size of the black border region in the image frame.

Schematically, as shown in fig. 3, it is a schematic diagram of cropping an image frame. Wherein the image frame 301 includes a picture area 302 and a black border area 303, and the clipping value d is set based on the size of the black border area 303 in the image frame 301 ₁ And d ₂ And based on the d ₁ And d ₂ The image frame 301 is cropped to remove the black border region 303 in the image frame 301, leaving only the picture region 302.

In one possible implementation, after the preprocessing is performed on the candidate image pair extracted from the 3D video resource, the processed candidate image pair is determined as an original image pair, and is used for optical flow extraction and parallax data generation.

Step 204, optical flow extraction is performed on the original image pairs, so as to obtain optical flow values corresponding to each group of original image pairs.

The implementation of step 204 may refer to step 102, and this embodiment is not described herein.

Step 205, screening the original image pair based on the vertical light flow threshold value to obtain a first image pair, wherein the probability that the vertical light flow value corresponding to the first image pair is larger than the vertical light flow threshold value is lower than a first preset probability threshold value.

Based on the characteristic that the optical flow value extracted from the left eye image and the right eye image has a vertical optical flow value of almost 0, correspondingly, in one possible implementation manner, the original image pair is screened under the optical flow value condition that the optical flow value in the vertical direction is almost 0, and the original image pair which does not meet the vertical optical flow value condition is removed.

Illustratively, the vertical optical flow threshold is 0.

In order to avoid deletion of an image pair conforming to the vertical light flow value condition caused by accidental factors, a first preset probability threshold is set, in one possible implementation manner, when an original image pair is screened based on the vertical light flow threshold, vertical light flow values corresponding to each pixel point (position) of each original left and right eye image can be obtained, the number of pixels with the vertical light flow value lower than the vertical light flow threshold is counted, the proportion of points with the value lower than the numerical light flow threshold in the vertical light flow values corresponding to the original left and right eye images is calculated based on the number and the total number of pixels, and whether the light flow values corresponding to the original left and right eye images meet the subsequent parallax data extraction condition is determined based on the relation between the proportion and the first preset probability threshold. If the ratio is higher than a first preset probability threshold, the prediction accuracy of the optical flow value corresponding to the original sample image is lower, and the optical flow value is required to be removed from the original image pair; otherwise, if the ratio is lower than the first preset probability threshold, the optical flow prediction accuracy corresponding to the original left-right eye image is higher, the subsequent optical flow screening or parallax data extraction can be performed, and the original image pair is reserved.

Correspondingly, the vertical light flow value conditions are: the probability that the vertical light flow value is greater than the vertical light flow threshold is lower than a first preset probability threshold. Illustratively, the vertical optical flow threshold may be 0 and the first preset probability threshold may be 0.05.

Optionally, due to the difference in the light flow extraction directions, a set of original image pairs has a forward light flow and a backward light flow, and the forward light flow and the backward light flow respectively have vertical light flow values, and correspondingly, in order to improve the accuracy of the discrimination, in one possible implementation, the determination needs to be made on the vertical light flow in the forward light flow and the backward light flow, so as to comprehensively determine whether to reject the original image pair from the sample dataset.

Illustratively, the first predetermined probability threshold is P ₁ The vertical optical flow threshold is 0, and the corresponding vertical optical flow condition can be expressed as: f (F) _fv The proportion of points with a value other than 0 in the forward optical flow (representing the vertical optical flow component) is less than P ₁ And/or F _bv The proportion of points with a value other than 0 in the backward optical flow (representing the vertical optical flow component) is less than P ₁ 。

Step 206, filtering the first image pair based on the horizontal optical flow threshold value to obtain a second image pair, wherein the maximum horizontal optical flow difference value corresponding to the second image pair is greater than the horizontal optical flow threshold value.

Based on the optical flow characteristic that the horizontal direction has larger amplitude optical flow, the horizontal optical flow value screening condition is set as follows: the maximum horizontal optical flow differential value is greater than the horizontal optical flow threshold value. Illustratively, the horizontal optical flow threshold may be 2.

In one possible implementation, a horizontal light flow value corresponding to the first image pair is obtained, and a maximum horizontal light flow value and a minimum horizontal light flow value are determined therefrom, so that the maximum horizontal light flow difference value is calculated based on the maximum horizontal light flow value and the minimum horizontal light flow value, so as to determine whether to reject the first image pair from the sample dataset by comparing a relation between the maximum horizontal light flow difference value and the horizontal light flow threshold value. Illustratively, if the maximum horizontal optical flow differential is greater than the horizontal optical flow threshold, indicating that the image has an optical flow value of greater magnitude in the horizontal direction, satisfying a horizontal optical flow condition, determining the first image pair as a second image pair; conversely, if the maximum horizontal optical flow differential value is less than the horizontal optical flow threshold, indicating that the image does not have an optical flow value of greater magnitude in the horizontal direction, the first image pair is rejected from the dataset in order to avoid impact on subsequent extraction of parallax data.

Illustratively, the horizontal optical flow condition may be expressed as: f (F) _fu (representing the horizontal optical flow component in the forward optical flow) the difference between the maximum and minimum is greater than the horizontal optical flow threshold, F _bu The difference between the maximum and minimum (representing the horizontal optical flow component in the backward optical flow) is greater than the horizontal optical flow threshold.

Step 207, screening the second image pair based on the first pixel threshold value to obtain a third image pair, wherein the probability that the pixel value difference value of the same pixel point between the third image pair and the predicted image pair is greater than the first pixel threshold value is lower than the second preset probability threshold value.

If the extraction of the optical flow values is relatively accurate, optical flow mapping is performed on the original left and right eye images, and the obtained predicted left and right eye images should be almost identical to the original left and right eye images, in order to determine the accuracy of the extracted optical flow values, in a possible implementation manner, whether to reject the image pair from the dataset is determined by comparing the relationship between the values at the same positions (at the same pixel points) in the original left and right eye images and the first pixel threshold.

Wherein, the process of screening the third image pair based on the first pixel threshold value may include the following steps:

1. And performing optical flow mapping processing on the original left-eye image in the second image pair to obtain a predicted right-eye image in the predicted image pair.

In one possible implementation, the original left-eye image in the second image pair is subjected to an optical flow mapping process, i.e. a predicted right-eye image is obtained based on the forward optical flow and the original left-eye image.

Schematically, if the original pixel coordinates are (x ₁ ，y ₁ ) The value of the light corresponding to the pixel coordinates is (x) ₂ ，y ₂ ) After optical flow mapping, the obtained pixel coordinates in the predicted right-eye image are (x) ₁ +x ₂ ，y ₁ +y ₂ )。

2. And performing optical flow mapping processing on the original right-eye image in the second image pair to obtain a predicted left-eye image in the predicted image pair.

In one possible implementation, the original right-eye image in the second image pair is subjected to an optical flow mapping process, that is, based on the backward optical flow and the original right-eye image, a predicted left-eye image may be obtained.

3. And determining the second image pair as a third image pair in response to the probability that the pixel value difference value corresponding to the same pixel point in the predicted right eye image and the original right eye image is greater than the first pixel threshold value is lower than a second preset probability threshold value, and/or in response to the probability that the pixel value difference value corresponding to the same pixel point in the predicted left eye image and the original left eye image is greater than the first pixel threshold value is lower than a second preset probability threshold value.

Based on the characteristic that the optical flow extraction precision is high, the corresponding predicted left and right eye images should be approximately the same as the pixel values at the pixel positions of the original left and right eye images, in one possible implementation manner, a first pixel threshold value is set, and whether the image pair is eliminated from the dataset is determined by comparing the relation between the difference value of the pixel values at the same position in the predicted left eye image and the original left eye image and the first pixel threshold value. Illustratively, the first pixel threshold may be 0.

Optionally, in order to improve accuracy of optical flow screening, in a possible implementation manner, a second preset probability threshold is set, in a process of optical flow screening based on the first pixel threshold, the number of points where a difference value of pixel values corresponding to the same pixel position in the predicted right-eye image and the original right-eye image is greater than the first pixel threshold and the total number of pixel points can be counted, the probability that the difference value of pixel values is greater than the first pixel threshold is calculated and compared with the second preset probability, if the difference value of pixel values is lower than the second preset probability threshold, the optical flow extraction is more accurate, and the optical flow extraction can be used for subsequent parallax data extraction, otherwise, if the difference value is higher than the second preset probability threshold, the difference value of pixel values is lower, the image pair corresponding to the original right-eye image is deleted from the data set.

Optionally, if it is determined that the original right-eye image and the predicted right-eye image do not meet the optical flow accuracy condition, the image corresponding to the original right-eye image may be directly removed; or determining that the original left-eye image and the predicted left-eye image do not meet the optical flow accuracy condition, and directly removing the image pair corresponding to the original left-eye image; or determining that the original left and right eye images and the predicted left and right eye images do not meet the optical flow accuracy condition, and eliminating the image pair.

Illustratively, the optical flow accuracy conditions are: the ratio of points where IL and IL' are inconsistent is less than a second predetermined probability threshold. Where IL represents the original left-eye image and IL' represents the original right-eye image.

Illustratively, the second preset probability threshold may be 3%.

The optical flow filtering may be performed in the order shown in the above embodiment in step 205, step 206, and step 207, or may be performed in the order of step 206, step 205, and step 207, which is not limited in this embodiment.

Step 208, determining the left eye image or the right eye image in the third image pair as a target monocular image, and determining a horizontal optical flow value corresponding to the target monocular image as a target optical flow value.

In one possible implementation manner, after the optical flow screening, a target optical flow value according to the parallax information relationship and a target image pair (a third image pair) corresponding to the target optical flow value may be obtained, and since the training sample set of monocular depth estimation is a monocular image, the original left-eye image or the original right-eye image in the third image pair may be directly determined as the target monocular image, and meanwhile, the horizontal optical flow value corresponding to the target monocular image may be determined as the target optical flow value.

Illustratively, if the target monocular image is the original left-eye image, the corresponding target light value is the horizontal light flow value in the forward light flow, and if the target monocular image is the original right-eye image, the corresponding target light value is the horizontal light flow value in the backward light flow. The forward optical flow and the backward optical flow are extracted by taking the original left-eye image as a reference frame.

And step 209, performing a reversal operation on the target horizontal light value to obtain candidate parallax data corresponding to the target monocular image.

Wherein the target horizontal light value is a horizontal light value corresponding to the forward light flow.

Since the forward optical flow is extracted by taking the left-eye image as the reference frame, when generating the parallax data based on the target horizontal optical flow value, the target horizontal optical flow value needs to be inverted first, so that the candidate parallax data corresponding to the target monocular image is obtained.

Illustratively, the formula for inverting the target level light value can be expressed as:

wherein,representing candidate disparity data before normalization, F _lu The horizontal light value corresponding to the left-eye image is indicated.

Step 210, performing normalization operation on the candidate parallax data to obtain target parallax data corresponding to the target monocular image.

Because the relative scales of the original images adopted in the optical flow extraction process may have differences, in order to solve the problem of inconsistent scales in the training process, in a possible implementation manner, normalization processing is further required to be performed on the candidate parallax data, so as to obtain target parallax data which can be used as a training sample.

In an illustrative example, the process of normalizing the candidate disparity data may include any one of the following methods.

1. And calculating target parallax data corresponding to the target monocular image based on the candidate parallax data, the maximum parallax data and the minimum parallax data.

In one possible implementation manner, the scaling method may be used to normalize the candidate parallax data, and the maximum parallax data and the minimum parallax data corresponding to the candidate parallax data are required to be obtained correspondingly, and then the target parallax data corresponding to the target monocular image is calculated based on the candidate parallax data, the maximum parallax data and the minimum parallax data.

The maximum parallax data is the maximum value of the candidate parallax data corresponding to the monocular image, and the minimum parallax data is the minimum value of the candidate parallax data corresponding to the target monocular image.

Illustratively, a formula for normalizing candidate parallax data based on a scaling method may be expressed as:

wherein,the candidate parallax data before normalization is represented, min (x) represents the minimum value, max (x) represents the maximum value operation, and DL represents the parallax data after normalization.

2. And calculating the target parallax data corresponding to the target monocular image based on the candidate parallax data and the median of the candidate parallax data.

In other possible embodiments, the method of extracting the median of the candidate parallax data may also be used to perform normalization processing, that is, obtain the median of the candidate parallax data, and calculate, based on the candidate parallax data and the median, the target parallax data corresponding to the target monocular image.

In one illustrative example, the formula for the normalization process may be:

where m represents the median of each candidate parallax data in the left-eye image, mean (x) represents the median fetching operation, x represents the cyclic variable,representing candidate disparity data before normalization, and DL representing disparity data after normalization.

In one possible implementation, the target monocular image and the parallax data corresponding to the target monocular image may be stored in association for subsequent use as training samples for monocular depth estimation.

In this embodiment, the original image pair is screened based on a preset optical flow condition, so that the target optical flow value obtained after screening satisfies: the optical flow in the vertical direction is almost 0 everywhere, the optical flow in the horizontal direction has a larger amplitude, the optical flow precision is higher, and the like, so that the accuracy of the extracted optical flow value is ensured, and the accuracy of parallax data is further improved.

The parallax data acquired in the embodiment is mainly used in a pre-training process of the monocular depth estimation model, and in order to avoid the influence of lower accuracy of the parallax data on the prediction accuracy of the monocular depth estimation model, the target monocular image is subjected to binarization processing to indicate the accuracy of the value of the parallax data corresponding to the pixel point, so that differential learning can be performed based on the accuracy in the model training process.

As shown in fig. 4, which is a flowchart illustrating a parallax data generating method according to another exemplary embodiment of the present application, an embodiment of the present application is described by taking the application of the method to a computer device as an example, the method includes:

Step 401, at least one group of original image pairs is acquired, wherein the original image pairs comprise original left-eye images and original right-eye images, and the original left-eye images and the original right-eye images are images corresponding to the same scene observed under a left-eye visual angle and a right-eye visual angle.

Step 402, optical flow extraction is performed on the original image pairs, so as to obtain optical flow values corresponding to each group of original image pairs.

Step 403, optical flow screening is performed on the original image pair based on the optical flow value, so as to obtain a target monocular image and a target optical flow value corresponding to each pixel point in the target monocular image.

Step 404, generating target parallax data corresponding to the target monocular image based on the target light current value.

The implementation manners of steps 401 to 404 may refer to the above embodiments, and the description of this embodiment is omitted here.

And 405, performing binarization processing on the target monocular image to obtain a target segmentation image corresponding to the target monocular image.

In order to improve accuracy of model training, in a possible implementation manner, binarization processing is performed on a target monocular image to obtain a target segmented image, and binarization confidence (accuracy) corresponding to each pixel point is marked in the target segmented image, so that whether parallax information at the pixel point needs to be learned or not can be determined based on the binarization confidence in the model training process.

In an exemplary example, the process of binarizing the target monocular image may include the steps of:

1. and obtaining a target mapping image corresponding to the target monocular image, wherein the target mapping image is obtained by performing optical flow mapping processing on the target monocular image.

Since the accuracy of the parallax data depends on the accuracy of the optical flow value, that is, if the accuracy of the parallax data needs to be evaluated, the accuracy of the parallax data can be indirectly evaluated by evaluating the accuracy of the optical flow value, and the accuracy of the optical flow value can be achieved by comparing the difference between the image after the optical flow mapping and the original image, correspondingly, in one possible implementation, the optical flow mapping is performed on the target monocular image based on the target optical flow value first, so as to obtain the target mapping image, so as to be used for comparing the difference between the target mapping image and the target monocular image subsequently.

The optical flow mapping process may refer to the above embodiments, and this embodiment is not described herein.

Optionally, if the target monocular image is a target left-eye image, correspondingly, optical flow mapping is required to be performed on the target right-eye image based on the target optical flow value, so as to obtain a target mapping image corresponding to the target left-eye image; otherwise, if the target monocular image is the target right-eye image, optical flow mapping is required to be performed on the target left-eye image based on the target optical flow value, so as to obtain a target mapping image corresponding to the target right-eye image.

2. And setting the pixel value of the pixel point in the target segmented image as a first pixel value in response to the difference value of the pixel values corresponding to the same pixel point in the target monocular image and the target mapped image being less than a second pixel threshold.

Based on the fact that the optical flow value is accurate, the characteristic that the target mapping image obtained after optical flow mapping is identical to the target monocular image is corresponding, in a possible implementation manner, whether the pixel values at the same pixel position are identical or not is compared, that is, the difference value of the pixel values corresponding to the same pixel point is smaller than a second pixel threshold value, and the difference value is used as a basis for evaluating the accuracy of parallax data corresponding to the pixel point. Illustratively, the second pixel threshold may be 0.

In order to distinguish points having the same pixel value and different pixel values, the method is realized by setting a pixel value corresponding to a point satisfying a difference value of the pixel values smaller than a second pixel threshold value as a first pixel value and setting a pixel value corresponding to a point not satisfying a difference value of the pixel values smaller than the second pixel threshold value as a second pixel value.

Illustratively, the first pixel value may be 1 and the second pixel value may be 0.

In one possible implementation manner, a pixel value corresponding to the same pixel point in the target monocular image and the target mapping image is obtained, a difference value between the two pixel values is calculated, if the difference value is smaller than a second pixel threshold value, the parallax data at the pixel point is higher in accuracy, and the corresponding pixel value is set to be 1.

3. And setting the pixel value of the pixel point in the target segmented image as a second pixel value in response to the difference value of the pixel values corresponding to the same pixel point in the target monocular image and the target mapped image being greater than the second pixel threshold.

In one possible implementation manner, a pixel value corresponding to the same pixel point in the target monocular image and the target mapping image is obtained, a difference value between the two pixel values is calculated, if the difference value is greater than a second pixel threshold value, it indicates that the parallax data at the pixel point is low in accuracy, and the corresponding pixel value is set to 0.

In one illustrative example, the process of binarizing the target monocular image may be expressed as:

wherein, IL [ i, j ] represents a pixel value corresponding to a certain pixel point in the target monocular image (target left eye image), IL' [ i, j ] represents a pixel value corresponding to the pixel point at the same position in the target mapping image, T is a second pixel threshold, and mask [ i, j ] represents a binarized value corresponding to the pixel point.

And step 406, determining the target monocular image, the target segmentation image and the target parallax data as a training sample set corresponding to the monocular depth estimation model.

In a possible application scene, the target monocular image, a target segmentation image obtained after binarization processing is carried out on the target monocular image, and target parallax data corresponding to the target monocular image are stored in a correlated mode so as to be used as a training sample set for training a monocular depth estimation model subsequently.

In this embodiment, by performing binarization processing on the target monocular image, whether the parallax data corresponding to the pixel point is accurate or not can be clarified in the subsequent model training process, so that the model can perform targeted learning on the correct data in the training process, and further accuracy of model training is improved.

Referring to fig. 5, a schematic diagram of a process for acquiring parallax data according to an exemplary embodiment of the present application is shown. The present embodiment is exemplified by batch acquisition of left and right eye images and corresponding parallax information based on 3D movie resources. The 3D movie 501 is subjected to video frame extraction to obtain a pair of candidate left-eye image 502A and candidate right-eye image 502B, and the candidate left-eye image 502A and the candidate right-eye image 502B are subjected to preprocessing (the preprocessing includes removing black areas and removing blurred frames) to obtain an original left-eye image 503A and an original right-eye image 503B; further, a forward optical flow 504A and a backward optical flow 504B are obtained through optical flow extraction, parallax data 505 corresponding to the target left-eye image 506 is obtained through optical flow screening and parallax data extraction, and the target left-eye image 506 and the parallax data 505 are stored in an associated mode to form a sample data pair.

The following are examples of the apparatus of the present application that may be used to perform the method embodiments of the present application. For details not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the method of the present application.

Referring to fig. 6, a block diagram of a parallax data generating apparatus according to an embodiment of the present application is shown. The device has the function of realizing the execution of the method embodiment by the computer equipment, wherein the function can be realized by hardware or can be realized by executing corresponding software by hardware. As shown in fig. 6, the apparatus may include:

an obtaining module 601, configured to obtain at least one set of original image pairs, where the original image pairs include an original left-eye image and an original right-eye image, and the original left-eye image and the original right-eye image are images corresponding to the same scene observed under a left-eye view angle and a right-eye view angle;

the optical flow extraction module 602 is configured to perform optical flow extraction on the original image pair to obtain optical flow values corresponding to each group of the original images;

the optical flow screening module 603 is configured to perform optical flow screening on the original image pair based on the optical flow value to obtain a target monocular image and target optical flow values corresponding to each pixel point in the target monocular image, where a relative magnitude relation between the target optical flow values corresponding to each pixel point after optical flow screening is the same as a relative magnitude relation between parallax data, and the target monocular image is an original left-eye image or an original right-eye image;

And a generating module 604, configured to generate target parallax data corresponding to the target monocular image based on the target light value.

Optionally, the optical flow values include a horizontal optical flow value and a vertical optical flow value;

the optical flow screening module 603 includes:

the first screening unit is used for screening the original image pair based on a vertical light flow threshold value to obtain a first image pair, and the probability that the vertical light flow value corresponding to the first image pair is larger than the vertical light flow threshold value is lower than a first preset probability threshold value;

a second filtering unit, configured to filter the first image pair based on a horizontal optical flow threshold value, to obtain a second image pair, where a maximum horizontal optical flow differential value corresponding to the second image pair is greater than the horizontal optical flow threshold value;

the third screening unit is used for screening the second image pair based on a first pixel threshold value to obtain a third image pair, wherein the probability that the pixel value difference value of the same pixel point between the third image pair and a predicted image pair is larger than the first pixel threshold value is lower than a second preset probability threshold value, and the predicted image is obtained by carrying out optical flow mapping on the third image pair;

And the determining unit is used for determining a left eye image or a right eye image in the third image pair as the target monocular image and determining a horizontal optical flow value corresponding to the target monocular image as the target optical flow value.

Optionally, the third screening unit is further configured to:

performing optical flow mapping processing on the original left-eye image in the second image pair to obtain a predicted right-eye image in the predicted image pair;

performing optical flow mapping processing on the original right-eye image in the second image pair to obtain a predicted left-eye image in the predicted image pair;

and determining the second image pair as the third image pair in response to the probability that the pixel value difference value corresponding to the same pixel point in the predicted right eye image and the original right eye image is greater than the first pixel threshold is lower than the second preset probability threshold, and/or in response to the probability that the pixel value difference value corresponding to the same pixel point in the predicted left eye image and the original left eye image is greater than the first pixel threshold is lower than the second preset probability threshold.

Optionally, the target monocular image is the original left eye image, and the target light current value is a horizontal light current value;

The generating module 604 includes:

the inverting unit is used for inverting the target horizontal light value to obtain candidate parallax data corresponding to the target monocular image;

and the normalization unit is used for performing normalization operation on the candidate parallax data to obtain target parallax data corresponding to the target monocular image.

Optionally, the normalization unit is further configured to:

calculating the target parallax data corresponding to the target monocular image based on the candidate parallax data, the maximum parallax data and the minimum parallax data;

or alternatively, the first and second heat exchangers may be,

and calculating the target parallax data corresponding to the target monocular image based on the candidate parallax data and the median of the candidate parallax data.

Optionally, the apparatus further includes:

the binarization processing module is used for performing binarization processing on the target monocular image to obtain a target segmentation image corresponding to the target monocular image;

and the determining module is used for determining the target monocular image, the target segmentation image and the target parallax data as a training sample set corresponding to the monocular depth estimation model.

Optionally, the binarization processing module includes:

The first acquisition unit is used for acquiring a target mapping image corresponding to the target monocular image, wherein the target mapping image is obtained by performing optical flow mapping processing on the target monocular image;

a first setting unit, configured to set a pixel value of the pixel point in the target segmented image to a first pixel value in response to a difference value of pixel values corresponding to the same pixel point in the target monocular image and the target mapped image being smaller than a second pixel threshold;

and the second setting unit is used for setting the pixel value of the pixel point in the target segmentation image as a second pixel value in response to the fact that the difference value of the pixel values corresponding to the same pixel point in the target monocular image and the target mapping image is larger than the second pixel threshold value.

Optionally, the acquiring module 601 includes:

the second acquisition unit is used for acquiring target videos in target storage formats, wherein the target storage formats at least comprise left and right storage formats;

an extracting unit, configured to extract at least one group of candidate image pairs from the target video, where the candidate image pairs include a candidate left-eye image and a candidate right-eye image;

and the preprocessing unit is used for preprocessing the candidate image pair to obtain the original image pair.

It should be noted that: in the parallax data generating apparatus provided in the above embodiment, when the functions thereof are implemented, only the division of the above functional modules is used as an example, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to perform all or part of the functions described above. In addition, the parallax data generating device and the parallax data generating method provided in the foregoing embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments, which are not repeated herein.

Referring to fig. 7, a block diagram of a computer device according to an embodiment of the present application is shown. The computer apparatus may be used to implement the parallax data generation method performed by the computer apparatus in the above-described embodiments.

Specifically, the present application relates to a method for manufacturing a semiconductor device.

The computer apparatus 700 includes a central processing unit (Central Processing Unit, CPU) 701, a system Memory 704 including a random access Memory (Random Access Memory, RAM) 702 and a Read-Only Memory (ROM) 703, and a system bus 705 connecting the system Memory 704 and the central processing unit 701. The computer device 700 also includes a basic Input/Output system (I/O) 706, which helps to transfer information between various devices within the server, and a mass storage device 707 for storing an operating system 713, application programs 714, and other program modules 715.

The basic input/output system 706 includes a display 708 for displaying information and an input device 709, such as a mouse, keyboard, or the like, for a user to input information. Wherein the display 708 and the input device 709 are coupled to the central processing unit 701 through an input output controller 710 coupled to a system bus 705. The basic input/output system 706 may also include an input/output controller 710 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, the input output controller 710 also provides output to a display screen, a printer, or other type of output device.

The mass storage device 707 is connected to the central processing unit 701 through a mass storage controller (not shown) connected to the system bus 705. The mass storage device 707 and its associated computer-readable storage media provide non-volatile storage for the computer device 700. That is, the mass storage device 707 may include a computer readable storage medium (not shown) such as a hard disk or a compact disk-Only Memory (CD-ROM) drive.

The computer-readable storage medium may include computer storage media and communication media without loss of generality. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable storage instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, erasable programmable read-Only register (Erasable Programmable Read Only Memory, EPROM), electrically erasable programmable read-Only Memory (EEPROM), flash Memory or other solid state Memory technology, CD-ROM, digital versatile disks (Digital Versatile Disc, DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will recognize that the computer storage medium is not limited to the one described above. The system memory 704 and mass storage device 707 described above may be collectively referred to as memory.

The memory stores one or more programs configured to be executed by the one or more central processing units 701, the one or more programs containing instructions for implementing the above-described method embodiments, and the central processing unit 701 executing the one or more programs implementing the parallax data generation methods provided by the respective method embodiments described above.

According to various embodiments of the application, the computer device 700 may also operate through a network, such as the Internet, to a remote server on the network. I.e. the computer device 700 may be connected to the network 712 via a network interface unit 711 connected to the system bus 705, or alternatively, the network interface unit 711 may be used to connect to other types of networks or remote server systems (not shown).

The memory also includes one or more programs stored in the memory, the one or more programs including steps for performing the methods provided by the embodiments of the present application, as performed by the computer device.

Embodiments of the present application also provide a computer-readable storage medium storing at least one program code loaded and executed by a processor to implement the parallax data generating method according to the above embodiments.

According to one aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions so that the computer device performs the parallax data generation method provided in the various alternative implementations of the above aspect.

It should be understood that references herein to "a plurality" are to two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. In addition, the step numbers described herein are merely exemplary of one possible execution sequence among steps, and in some other embodiments, the steps may be executed out of the order of numbers, such as two differently numbered steps being executed simultaneously, or two differently numbered steps being executed in an order opposite to that shown, which is not limiting.

The foregoing description of the preferred embodiments of the present application is not intended to limit the application, but rather, the application is to be construed as limited to the appended claims.

Claims

1. A parallax data generation method, characterized in that the method comprises:

performing optical flow extraction on the original image pair to obtain optical flow values corresponding to each group of original images, wherein the optical flow values comprise horizontal optical flow values and vertical optical flow values;

screening the original image pair based on a vertical light flow threshold value to obtain a first image pair, wherein the probability that the vertical light flow value corresponding to the first image pair is larger than the vertical light flow threshold value is lower than a first preset probability threshold value;

screening the first image pair based on a horizontal optical flow threshold value to obtain a second image pair, wherein the maximum horizontal optical flow difference value corresponding to the second image pair is larger than the horizontal optical flow threshold value;

Screening the second image pair based on a first pixel threshold value to obtain a third image pair, wherein the probability that the pixel value difference value of the same pixel point between the third image pair and a predicted image pair is larger than the first pixel threshold value is lower than a second preset probability threshold value, and the predicted image is obtained by carrying out optical flow mapping on the third image pair;

determining a left eye image or a right eye image in the third image pair as a target monocular image, and determining a horizontal optical flow value corresponding to the target monocular image as a target optical flow value;

2. The method of claim 1, wherein the screening the second image pair based on the first pixel threshold to obtain a third image pair comprises:

3. The method according to claim 1 or 2, wherein the target monocular image is the original left-eye image, and the target light current value is a horizontal light current value;

the generating, based on the target light value, target parallax data corresponding to the target monocular image includes:

performing inverse operation on the target horizontal light value to obtain candidate parallax data corresponding to the target monocular image;

and carrying out normalization operation on the candidate parallax data to obtain target parallax data corresponding to the target monocular image.

4. The method according to claim 3, wherein the normalizing the candidate parallax data to obtain the target parallax data corresponding to the target monocular image includes at least one of:

or alternatively, the first and second heat exchangers may be,

5. The method according to claim 1 or 2, wherein after generating the target parallax data corresponding to the target monocular image based on the target light current value, the method further comprises:

Performing binarization processing on the target monocular image to obtain a target segmentation image corresponding to the target monocular image;

and determining the target monocular image, the target segmentation image and the target parallax data as a training sample set corresponding to a monocular depth estimation model.

6. The method according to claim 5, wherein the binarizing the target monocular image to obtain a target segmented image corresponding to the target monocular image includes:

obtaining a target mapping image corresponding to the target monocular image, wherein the target mapping image is obtained by performing optical flow mapping processing on the target monocular image;

setting the pixel value of the pixel point in the target segmentation image as a first pixel value in response to the difference value of the pixel values corresponding to the same pixel point in the target monocular image and the target mapping image being smaller than a second pixel threshold;

and setting the pixel value of the pixel point in the target segmentation image as a second pixel value in response to the difference value of the pixel values corresponding to the same pixel point in the target monocular image and the target mapping image being greater than the second pixel threshold.

7. The method according to claim 1 or 2, wherein said acquiring at least one set of original image pairs comprises:

acquiring a target video in a target storage format, wherein the target storage format at least comprises a left storage format and a right storage format;

extracting at least one group of candidate image pairs from the target video, wherein the candidate image pairs comprise candidate left-eye images and candidate right-eye images;

and preprocessing the candidate image pair to obtain the original image pair.

8. A parallax data generating apparatus, characterized in that the apparatus comprises:

the optical flow extraction module is used for extracting optical flow of the original image pair to obtain optical flow values corresponding to the original images of each group, wherein the optical flow values comprise horizontal optical flow values and vertical optical flow values;

the optical flow screening module is used for screening the original image pair based on a vertical optical flow threshold value to obtain a first image pair, and the probability that the vertical optical flow value corresponding to the first image pair is larger than the vertical optical flow threshold value is lower than a first preset probability threshold value;

9. A computer device comprising a processor and a memory, wherein the memory has stored therein at least one program code that is loaded and executed by the processor to implement the parallax data generation method of any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that at least one program code is stored in the computer-readable storage medium, the program code being loaded and executed by a processor to implement the parallax data generation method according to any one of claims 1 to 7.