WO2014180255A1

WO2014180255A1 - Data processing method, apparatus, computer storage medium and user terminal

Info

Publication number: WO2014180255A1
Application number: PCT/CN2014/075991
Authority: WO
Inventors: 李飞; 王云飞
Original assignee: 中兴通讯股份有限公司
Priority date: 2013-10-22
Filing date: 2014-04-22
Publication date: 2014-11-13
Also published as: CN104574331A; CN104574331B

Abstract

Disclosed are a data processing method, apparatus, computer storage medium and user terminal. The method comprises: preprocessing every frame of the acquired stereoscopic image data; extracting feature values from the left and right views of every frame of the stereoscopic image data respectively and matching the feature values, and obtaining a pixel coordinates mapping model between the left view and the right view according to the matching result; every frame of the stereoscopic image data being corresponding to one pixel coordinates mapping model between the left view and the right view, and obtaining an average pixel coordinates mapping model according to the pixel coordinates mapping models between the left view and the right view, the pixel coordinates mapping models corresponding to multiple frames of stereoscopic image data; performing image processing on the basis of the average pixel coordinates mapping model. Introduction of the present invention at least solves the problem that stable stereoscopic image data is unobtainable, and can also accomplish optimization and reconstruction of the obtained stereoscopic image data.

Description

Data processing method, device, computer storage medium and user terminal

The present invention belongs to the field of multimedia application technologies, and in particular, to a data processing method, device, computer storage medium, and user terminal. Background technique

SIFT is a local feature descriptor involved in the SIFT local feature description algorithm. SIFT features are unique, rich in information, and highly invariant to most image transformations. In Mikolajczyk's invariance comparison experiments on ten local descriptors including SIFT operators, SIFT and its extension algorithms have been proven to have the strongest robustness in the same descriptors. The so-called robustness is Refers to a description of stability.

The SIFT algorithm consists of two parts: a scale-invariant region of interest detector and a feature descriptor based on the gray-scale step distribution of the region of interest. Its main features are as follows:

a) The SIFT feature is a local feature of the image that remains invariant to rotation, scale scaling, and brightness variations, and maintains a degree of stability to viewing angle changes, affine transformations, and noise.

b) Distinctiveness and rich information, suitable for fast and accurate matching in the massive feature database.

c) Multiplicity, even a small number of objects can produce a large number of SIFT feature vectors. d) High speed, optimized SIFT matching algorithms can even meet real-time requirements.

e) Scalability, which can be easily combined with other forms of feature vectors.

The SIFT feature matching algorithm mainly includes two stages. The first stage is the generation of SIFT features, that is, the feature vectors that are independent of scale scaling, rotation, and brightness change are extracted from multiple images; the second stage is the matching of SIFT feature vectors.

Today, SIFT algorithms have been widely used in target recognition, image restoration, image stitching, etc. area.

RANSAC (random sample consensus) is a robust estimation method proposed by Fishier and Bolles. Today, RANSAC technology has become an important method for estimating linear and nonlinear models.

In the process of implementing the technical solutions of the embodiments of the present application, at least the following technical problems exist in the prior art:

In a multimedia application field of a user terminal, such as a mobile terminal, the collection device performs stereoscopic imaging using the image data of the captured image, and the synthesis of the stereoscopic image includes preparation of stereo data, sub-pixel determination criteria, pixel samples of each viewpoint, and various viewpoints. Sub-pixel arrangement synthesis, compressed transmission of stereo images and display of several parts.

In the process of combining two-view stereoscopic images, since the stereo data of each frame has only two pictures of the left and right views, the error rate of the synthesis algorithm is low. When using a mobile terminal to collect dual-view stereoscopic image data, problems such as finger occlusion are prone to problems due to unstable operation of the user. When the noise of the collection device is large, the camera is blocked, and the like, the stereoscopic image data is easily caused. Not available, poor stability, resulting in a synthetic or poor quality stereo picture. It can be seen that the stability of the stereoscopic image data as the data source plays an important role in the final synthesis of the stereoscopic image in the process of assembling the entire stereoscopic image. However, there is no effective solution to how to obtain stable stereoscopic image data. Program. Summary of the invention

The embodiments of the present invention are intended to provide a data processing method, apparatus, and user terminal, which at least solve the problem that stable stereoscopic image data cannot be obtained, and can optimize and reconstruct the obtained stereoscopic image data.

The technical solution of the embodiment of the present invention is implemented as follows:

The embodiment of the invention provides a data processing method, including:

When preprocessing each frame of stereo image data, the left of each frame of stereo image data The view and the right view respectively extract feature values and perform matching, and obtain a pixel coordinate mapping model between the left and right views according to the matching result;

Each frame of stereoscopic image data corresponds to a pixel coordinate mapping model between left and right views, and an average pixel coordinate mapping model is obtained according to a pixel coordinate mapping model between left and right views corresponding to the multi-frame stereoscopic image data;

Image processing is performed based on the average pixel coordinate mapping model.

In the above solution, the left and right views of the stereoscopic image data of each frame are respectively extracted and matched, and specifically include:

Extracting SIFT features for the left view and the right view respectively, and obtaining SIFT feature descriptors corresponding to the left view and the right view, respectively;

The SIFT feature descriptors corresponding to the left view and the right view are respectively matched to the SIFT feature points, and the matching point pairs of the SIFT feature points of the left and right views are obtained.

In the above solution, the pixel coordinate mapping model between the left and right views is obtained according to the matching result, and specifically includes:

A pixel coordinate mapping model between the left and right views is obtained with the pair of matching points as input parameters.

In the above solution, the pixel coordinate mapping model between the left and right views is obtained by using the matching point pair as an input parameter, and specifically includes:

Initially selecting a data point set from the set S of the matching point pairs for initializing; filtering the matching support point set Si from the data point set according to a preset threshold, as a consistent according to the size of Si and a preset threshold Alignment to continuously select new data samples and estimate the pixel coordinate mapping model between the left and right views until the largest uniform set is obtained, and the pixel coordinates between the desired left and right views are obtained according to the largest uniform set as the final data sample. Mapping model.

In the above solution, the pixels between the left and right views corresponding to the multi-frame stereo image data are The coordinate mapping model obtains an average pixel coordinate mapping model, which specifically includes:

Obtaining a pixel coordinate mapping model between the left and right views corresponding to the specified number of multi-frame stereo image data before the stereo image synthesis, and taking the pixel coordinate mapping model between the left and right views corresponding to the multi-frame stereo image data On average, an average pixel coordinate mapping model is obtained.

In the above solution, the image processing is performed based on the average pixel coordinate mapping model, and the specific method includes:

Performing a repair process of the damaged area based on the average pixel coordinate mapping model;

Noise reduction processing is performed based on the average pixel coordinate mapping model.

In the above solution, the repairing the damaged area based on the average pixel coordinate mapping model includes:

Detecting the damaged area, determining coordinate information of the damaged area in another normal view based on the average pixel coordinate mapping model, replacing the image content of the current damaged area with the image content of the corresponding area in the normal view, and detecting the edge of the damaged area , Corrected by the average of the gray values of the corresponding pixel points in the left and right views.

In the above solution, the performing noise reduction processing based on the average pixel coordinate mapping model includes:

Detecting a suspected noise point, determining a location area of the suspected noise point in another view based on the average pixel coordinate mapping model, performing gray scale comparison to determine whether the suspected noise point is a noise point; and determining the determined noise point according to the predetermined position area The gray value of each pixel in the neighborhood is corrected by the gray value of the pixel corresponding to the other view.

The embodiment of the invention further provides a data processing device, including:

The pre-processing unit is configured to extract and match the feature values of the left view and the right view of each frame of the stereo image data when the pre-processing of each frame of the stereo image data is performed, and obtain the pixels between the left and right views according to the matching result. Coordinate mapping model;

An image processing unit configured to perform image processing based on the average pixel coordinate mapping model; Each frame of stereoscopic image data corresponds to a pixel coordinate mapping model between left and right views, and the average pixel coordinate mapping model is obtained according to a pixel coordinate mapping model between left and right views corresponding to the multi-frame stereoscopic image data.

In the above solution, the preprocessing unit further includes: a feature matching subunit;

The feature matching sub-unit is configured to extract SIFT features respectively for the left view and the right view, and obtain SIFT feature descriptors corresponding to the left view and the right view respectively; and the SIFT feature descriptors corresponding to the left view and the right view respectively perform SIFT features Point matching, obtaining matching point pairs of SIFT feature points of the left and right views.

In the above solution, the preprocessing unit further includes: a model estimation subunit;

The model estimation subunit is configured to obtain the pixel coordinate mapping model between the left and right views by using the matching point pair as an input parameter.

In the above solution, the model estimation subunit is configured to randomly select a data point set from the set S of the matching point pairs for initializing; and filter out the matching support points according to the preset threshold from the data point set. Collecting Si as a uniform set; continuously selecting new data samples and estimating a pixel coordinate mapping model between left and right views according to the comparison of the size of Si with a preset threshold until the largest uniform set is obtained, according to the largest uniform set For the final data sample, get the desired pixel coordinate mapping model between the left and right views.

In the above solution, the image processing unit further includes: a model mean value obtaining sub-unit; the model mean value obtaining sub-unit configured to acquire a corresponding number of multi-frame stereo image data corresponding to the latest set of stereo images before synthesis The pixel coordinate mapping model between the views averages the pixel coordinate mapping model between the left and right views corresponding to the multi-frame stereo image data to obtain an average pixel coordinate mapping model.

In the above solution, the image processing unit further includes: a first processing subunit and a second processing subunit;

The first processing subunit is configured to break based on the average pixel coordinate mapping model Repair processing of the damaged area;

A second processing sub-unit configured to perform noise reduction processing based on the average pixel coordinate mapping model.

In the above solution, the first processing sub-unit is further configured to detect a damaged area, determine coordinate information of the damaged area in another normal view based on the average pixel coordinate mapping model, and use image content of the corresponding area in the normal view. The image content of the current damaged area is replaced, and the edge of the damaged area is corrected by the average of the gray values of the corresponding pixel points in the left and right views.

In the above solution, the second processing sub-unit is further configured to detect a suspected noise point, determine a location area of the suspected noise point in another view based on the average pixel coordinate mapping model, and perform gray scale comparison to determine a suspected noise point. Whether it is a noise point; the determined noise point is corrected according to the gray value of each pixel in the neighborhood of the predetermined position area by the gray value of the pixel corresponding to the other view.

The preprocessing unit, the image processing unit, the feature matching subunit, the model estimation subunit, the model mean acquisition subunit, the first processing subunit, and the second processing subunit may use a central processing unit when performing processing (CPU, Central Processing Unit), digital signal processor (DSP, Digital Singnal Processor) or programmable logic array (FPGA, Field-Programmable Gate Array) implementation.

The embodiment of the present invention further provides a computer storage medium, the computer storage medium comprising a set of instructions, when executed, causing at least one processor to execute the data processing method according to any one of the above aspects .

An embodiment of the present invention further provides a user terminal, where the user terminal includes the data processing device as described above.

The method of the present invention includes: when preprocessing each frame of stereoscopic image data, extracting feature values and matching the left and right views of each frame of stereoscopic image data respectively, and obtaining pixels between the left and right views according to the matching result. Coordinate mapping model; each frame of stereoscopic image data corresponds to a pixel coordinate mapping model between left and right views, and corresponding left and right views according to multi-frame stereo image data The pixel coordinate mapping model between the images obtains an average pixel coordinate mapping model; image processing is performed based on the average pixel coordinate mapping model.

According to the embodiment of the present invention, since the stereo image data of each frame of the image is preprocessed to obtain a pixel coordinate mapping model between the left and right views, the model is optimized to obtain an average pixel coordinate mapping model, and finally based on the average pixel coordinate. The mapping model performs image processing, and can optimize and reconstruct the obtained stereo image data. DRAWINGS

1 is a flowchart of implementing a method principle according to an embodiment of the present invention;

2 is a schematic structural diagram of a basic composition of an apparatus according to an embodiment of the present invention;

FIG. 3 is a flowchart of implementing SIFT feature extraction according to an embodiment of the present invention; FIG.

4 is a flowchart of an implementation of estimating a coordinate mapping model according to an embodiment of the present invention;

FIG. 5 is a flowchart of implementing damage repair according to an embodiment of the present invention. detailed description

The implementation of the technical solution will be further described in detail below with reference to the accompanying drawings.

The application scenario of the data processing method in the embodiment of the present invention is a user terminal, in particular, a multimedia application technology field of a mobile terminal, for example, a data processing scheme for optimizing and reconstructing two-point stereoscopic image data in a two-view stereoscopic image collection process. At least the problem that the stable stereoscopic image data cannot be obtained in the prior art is solved, and the user terminal, especially when the mobile terminal collects the stereoscopic image data, is not caused by the occlusion of the camera and the noise, and the generated stereoscopic image data is still unavailable. It is possible to finally synthesize a stereoscopic picture of relatively high quality.

The embodiment of the present invention is mainly based on the SIFT algorithm and the RANSAC algorithm. The RANSAC algorithm is used to establish the left and right view pixel mapping model by using the SIFT matching points of the left and right views, and the damaged view is repaired and reconstructed according to the left and right view pixel mapping model. Providing qualified stereoscopic image data for the synthesis algorithm, using the embodiment of the present invention, and finally obtaining the modified stereo Body image data, using the left and right view pixel mapping model is more accurate for repairing damaged images, the calculation amount is small, and the repair effect is good.

Among them, the SIFT algorithm has been introduced before, and the basic idea of the RANSAC algorithm is: When performing parameter estimation, instead of treating all possible input data indiscriminately, first design a search engine for specific problems, This search engine iteratively rejects input data (Outliers) that are inconsistent with the estimated parameters and then uses the correct input data to estimate the parameters. In the embodiment of the present invention, a specific implementation of the RANSAC algorithm is used to obtain a left-right view pixel mapping model.

Stereoscopic imaging is based on the principle of creating a stereoscopic parallax. The principle of creating a stereoscopic parallax is that the two eyes of a person view the world from different angles, that is, there is a slight difference between the object seen by the left eye and the same object seen by the right eye. The average distance between the eyes of the person is about 65mm, so the way of describing the outline of the scene is not the same. The brain performs comprehensive processing (physical fusion) based on these two nuanced scenes, producing accurate three-dimensional object perception and the positioning of the object in the scene, which is a deep three-dimensional sense.

The work of the stereo imaging system is to generate at least two images for each scene, one for the image seen by the left eye and the other for the image seen by the right eye. The two associated images are called stereo images. Pair (stereo pair). The stereo display system must have the left eye only see the left image and the right eye only see the right image. The display method for the embodiment of the present invention is a two-viewpoint free-stereoscopic display method, that is, two images of left and right viewpoints, and by rearranging and combining the sub-pixel points, a stereoscopic image is generated and sent to the display device. By adding a lens cylinder or a parallax barrier in front of a CRT display or a flat panel display, the emission direction of each pixel light is controlled so that the image of the left viewpoint is only incident on the left eye, and the image of the right viewpoint is only incident on the right eye, using binocular parallax, Produce stereo vision.

The stereoscopic image synthesis work includes the preparation of the stereoscopic data, the sub-pixel determination criterion, the pixel of each view pixel, the synthesis of the sub-pixels of each view, the compression transmission and display of the stereo image, and the embodiment of the present invention is mainly directed to the stereo image data. Preparation, is the number of stereo images obtained According to the optimized reconstruction, after the stereoscopic image data is optimized and reconstructed by the embodiment of the present invention, even if the user terminal, especially the mobile terminal collects stereoscopic image data, the stereoscopic image data generated due to the occlusion of the camera and the noise is not available. It is possible to finally synthesize a stereoscopic picture of relatively high quality.

The data processing method of the embodiment of the present invention, as shown in FIG. 1, includes:

Step 101: When preprocessing each frame of stereoscopic image data, extracting feature values and matching the left and right views of each frame of stereoscopic image data respectively, and obtaining a pixel coordinate mapping model between the left and right views according to the matching result. ;

Here, the stereoscopic image data may also be a stereoscopic image material, and will not be described again.

Step 102: Each frame of stereoscopic image data corresponds to a pixel coordinate mapping model between left and right views, and an average pixel coordinate mapping model is obtained according to a pixel coordinate mapping model between left and right views corresponding to the multi-frame stereo image data;

Step 103: Perform image processing based on the average pixel coordinate mapping model.

The data processing apparatus of the embodiment of the present invention, as shown in FIG. 2, includes:

The pre-processing unit is configured to extract and match the feature values of the left view and the right view of each frame of the stereo image data when the pre-processing of each frame of the stereo image data is performed, and obtain the pixels between the left and right views according to the matching result. Coordinate mapping model.

The image processing unit is configured to perform image processing based on the average pixel coordinate mapping model; each frame of stereo image data corresponds to a pixel coordinate mapping model between left and right views, and pixel coordinates between left and right views corresponding to the multi-frame stereo image data The mapping model obtains the average pixel coordinate mapping model.

A computer storage medium according to an embodiment of the present invention, the computer storage medium comprising a set of instructions, when executed, causing at least one processor to execute the data processing method.

The user terminal of the embodiment of the present invention includes the basic structure of the data processing apparatus of the embodiment of the present invention, and various modifications and equivalents thereof are omitted. The following is an application scenario of the embodiment of the present invention (the scene when the user terminal takes a photo of the mobile phone):

In the scenario when the mobile phone is photographed, the data processing method in the embodiment of the present invention is specifically a two-view eye stereo image data optimization reconstruction processing scheme. When the user uses the mobile phone to take a stereoscopic image, the user needs to preview first, and the user presses the shutter button. The first end time is generally in a relatively stable and reliable stereo data collection state, and the process of pressing the shutter is prone to sudden noise. In order to achieve the purpose of the embodiments of the present invention, the embodiment of the present invention needs to perform pre-processing before photographing, so as to obtain stable stereoscopic image data, extract SIFT features from the left and right views of the preview image, and perform matching to determine the left and right views of the current scene. Between the pixel coordinate mapping model, according to this mapping model, we perform occlusion repair and denoising on single-view low-quality stereo data to generate reliable dual-view stereo data.

The following is a detailed step description, in which the first step to the sixth step are implemented in the image preview process (ie, the process of the above preprocessing), and the seventh step to the eleventh step are steps implemented after the shutter button is pressed to take a picture. (ie, the stereo image is obtained based on the optimized and reconstructed stereo image data obtained by the preprocessing process, and the final result is obtained after the damaged image is repaired and denoised).

Step 1: Pre-preview the left and right view images during preview, including scale scaling and image smoothing. Since the phone preview mode is different from the camera mode setting, the image captured in the preview mode is first scaled, such as 5Mp. Since the SIFT feature description algorithm has scale invariance, it does not affect SIFT feature extraction and matching. The Gaussian low-pass filter is applied to image smoothing because the Gaussian low-pass filter can effectively overcome the ringing effect and has an obvious effect of eliminating noise. In this step, the beneficial effects are as follows: The image smoothing process is performed by using a Gaussian low-pass filter, mainly to reduce the influence of scale scaling on the picture quality, so as to ensure the reliability of the coordinate mapping model.

Step 2: Extract the SIFT features from the processed left and right view images.

Here, the flow of the SIFT matching is as shown in FIG. 3. The operation steps of the left view (steps 111-116) and the operation steps of the right view (steps 121-126) are the same processing. Taking the left view as an example, the method includes: Step 111: Prepare an SIFT feature for extracting an image;

Step 112: Construct a scale space;

Step 113: detecting a spatial extreme point;

Step 114: accurately locate the spatial extreme point;

Step 115: Removing an edge response point;

Step 116: Generate a SIFT feature descriptor.

The SIFT feature descriptor is obtained from the operation steps (steps 111-116) of the left view, and the SIFT feature descriptor is obtained by the operation steps (steps 121-126) of the right view, and step 13 is performed.

Step 13, finally performing SIFT feature point matching by using the SIFT feature descriptor obtained by the operation steps (steps 111-116) of the left view and the SIFT feature descriptor obtained by the operation steps of the right view (steps 121-126). In order to obtain the SIFT feature point matching point pairs of the left and right views, which are simply referred to as matching point pairs.

In summary, the basic points of the process in Figure 3 are:

(a) Detecting scale space extreme points, including the generation of scale space, and the detection of spatial extreme points. Specifically, SIFT uses a Gaussian difference (DoG) scale space, which is formed by convolving a Gaussian differential kernel of adjacent dimensions with an input image. The DoG core is not only a linear approximation of LoG, but also greatly simplifies the calculation of scale space. For each pixel, the extreme point is searched in the neighborhood of its image space and DoG scale space, and the position of the feature point is initially obtained. The middle to-be-detected point is compared with its neighboring fields of 8 adjacent points of the same scale and 9 X 2 points corresponding to the upper and lower adjacent scales to ensure detection in both the scale space and the two-dimensional image space. To the extreme point. If a point is the largest or smallest value in the DOG scale space layer and the 26 fields in the upper and lower layers, the point is considered to be a feature point of the image at the scale.

(b) Precisely locate spatial extreme points, including removing low contrast key points and removing edge response points. (c) Generate a 128-dimensional SIFT feature descriptor.

Specifically, in the actual calculation process, in order to enhance the robustness of the matching, a total of 16 seed points of 4 X 4 are used for each key point to describe, and each seed point has 8 direction vector information, so that for a key point, 128 data is generated, which ultimately forms a 128-dimensional SIFT feature vector.

Step 3: Match the SIFT feature points of the left and right views. When the SIFT feature vectors of the two images (image 1 and image 2) are generated, the Euclidean distance of the key feature vector is used as the similarity determination metric of the key points in the two images. The Euclidean distance is defined as shown in Equation 1: (F _a , F _b ) = ^∑:._H ) ) ^{2 The} formula takes a key point in the left view and finds it closest to the Euclidean distance in Image 2. Two key points, then match according to the distance-ratio criterion, that is, in these two key points, if the nearest distance is divided by the nearest distance ^2 to get mtio, if rat. If it is greater than a certain threshold ε, then the pair is accepted. The proportional threshold can be 1.5 according to the experimental result. The ratio threshold is 1.5 to ensure that the number of matching points is sufficient, and the amount of calculation can be effectively controlled. The number of matching points decreases as the threshold increases, and the ratio threshold is lowered, SIFT matching. The number of points will decrease, but it will be more stable. The definition is as shown in Equation 2.

Ratio = d\ / d2

Ratio > ε, success; Equation 2

Else, failure

With ra i. As the value increases, the number of SIFT matching points will decrease, but the accuracy will increase. Since the accuracy of the SIFT matching point pair is high in the embodiment of the present invention, and the difference between the left and right views is small, the number of matching points is too high, so that the calculation amount is too large. Therefore, the value in this example should be increased appropriately. In this implementation, ra ii. The value is 1.8.

Record the coordinate information of the matching point pair.

It is also possible to use the point closest to the Euclidean distance in the right view as the matching point of the current left view SIFT key point, and record the coordinate information of the matching point pair. Step 4: Estimate the pixel coordinate mapping model between the left and right views of the current scene using the RANSAC algorithm.

Here, the flow of the pixel coordinate mapping model between the left and right views is estimated as shown in FIG. 4, which includes:

Step 201: Randomly select 8 sets of matching point pairs for initializing the pixel coordinate mapping model between the left and right views.

Step 202: Find a set of support points of the current model.

Step 203: Determine whether the size of the support point set meets a threshold. If yes, execute step 204. Otherwise, perform step 201.

Here, the threshold may be 67% of the entire data point set according to the experimental result, and the threshold is 67% of the entire data point set to ensure the validity of the model.

Step 204: Estimate a pixel coordinate mapping model between the left and right views.

The flow of Fig. 4 is to estimate the left and right view pixel mapping model by using the coordinate information of the matching point pairs in the left and right views as the input parameters of the RANSAC model estimation system.

In summary, the basic points of the process in Figure 4 are:

(a) randomly selecting a set of data points from the matching point pair set S, and initializing the model from the subset.

(b) Find a set of support points Si that becomes the current model according to the threshold Td, and the set Si is a consistent set of samples, which is defined as an effective point;

(c) If the size of the set Si exceeds a certain threshold T, re-estimate the model with Si and end;

(d) If the size of the set Si is less than the threshold Ts, select a new sample and repeat the above steps;

After N attempts, the largest uniform set Si was selected and used to re-estimate the model to get the final result. In this example, the pixel coordinate information of the left and right views are P1 and P2, respectively. It can be seen that the left and right views are associated by the basic matrix F of the two images, satisfying P2TFP1 = 0. The process of model estimation in this step is actually the process of solving F. Since there are enough matching points, a reliable coordinate mapping model can be generated.

Step 5: Only the correct matching point pairs that match the pixel coordinate mapping model between the left and right views in step 4, that is, the point set in Si, are saved together with the obtained left and right view pixel coordinate mapping models as the frame image reference information.

Step 6: The first step to the fifth step of the infinite loop, only retain the image reference information of the latest four frames of images, and establish a queue. Each frame collects one frame of stereo image data, deletes the frame header information, and stores the latest frame reference information at the end of the queue.

Step 7: Take a picture, and smooth the left and right views, and solve the average pixel coordinate mapping model with the first three frames of the pixel coordinate mapping model between the current left and right views. After taking a photo, the user selects whether the view needs to be repaired, and if necessary, continues.

Step 8: Verify that it is occluded. The average of the left and right views is divided into 8 blocks, and the average gray value is taken for each block, and the average gray value of the corresponding block in the left and right views is compared. If the relative difference of the average gray value of the block is more than 10%, then The low gray value block is regarded as a damaged block. If the number of broken blocks is 0, then jump to the tenth step.

Step 9: Repair the occlusion area.

Here, the damage repair process is shown in Figure 5, including:

Step 301, accurately detecting the damaged area (sobel operator).

Step 302: Calculate coordinate information of the damaged area in a normal view.

Step 303: Repair the damaged area.

Step 304, repair the edge.

In summary, the basic points of the flow of Figure 5 are:

(a) Accurately identify the damaged area. In the damaged block, the edge of the gray-scale mutation is detected by the sobel operator. The operator consists of two sets of 3 x 3 matrices, horizontal and vertical, which are planed to the image. Convolution, you can get the horizontal and vertical brightness difference approximation separately.

(b) Determine the coordinate information of the damaged area in another view using the current scene left and right view coordinate mapping model.

( c ) Replace the current damaged area image content with the corresponding area image content in the normal view.

(d) Edge repair. For the edge detected in (a), the gray value of each pixel in the 3 x 3 area of each pixel is corrected by the average of the gray values of the corresponding pixel points in the left and right views.

Step 10: Noise point detection. During the photo taking process, the left and right views may be accompanied by a certain amount of salt and pepper noise. At the same time, after the ninth step, due to the limitation of the coordinate mapping model, a small amount of salt and pepper noise will be accompanied. In this example, the median filter is used to detect the noise of the left and right views. , and mark the noise points. That is, in the current point Ν Ν ( (N is an odd number), the maximum value, the minimum value and the mean value of the gradation are taken, if the gray value of the current point is the maximum or minimum value within the region, and exceeds the set threshold ( If the average gray value in the neighborhood is 60%-150% is the basic threshold, and outside the range is the set threshold, it may be noise and marked as suspicious. At this time, the coordinate mapping model is used to determine the position area of the suspicious point in another view, the current point is placed in this position, and the gray level comparison is performed again to determine whether the current point is a noise point.

Step 11: Fix the noise point. The gray value of each pixel in the neighborhood of the noise point 3 x 3 confirmed in the tenth step is corrected by the gray value of the pixel corresponding to the other viewpoint.

Step 12: Submit the optimized stereo image data to the synthesis algorithm, and use the existing synthesis algorithm to synthesize the stereo image.

The integrated modules described in the embodiments of the present invention may also be stored in a computer readable storage medium if they are implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solution of the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product. The computer software product is stored in a storage medium and includes a plurality of instructions. Enabling a computer device (which may be a personal computer, server, or network device, etc.) to perform all of the methods described in various embodiments of the present invention or section. The foregoing storage medium includes: a USB flash drive, a removable hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and the like. The medium of the code. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.

The above is only the preferred embodiment of the present invention and is not intended to limit the scope of the present invention. Industrial applicability

According to the embodiment of the present invention, since the stereo image data of each frame of the image is preprocessed to obtain a pixel coordinate mapping model between the left and right views, the model is optimized to obtain an average pixel coordinate mapping model, and finally based on the average pixel coordinate. The mapping model performs image processing, and can optimize and reconstruct the obtained stereo image data.

Claims

claims

1. A data processing method, the method includes:

When preprocessing each frame of stereo image data collected, feature values are extracted from the left view and right view of each frame of stereo image data and matched, and the pixel coordinate mapping model between the left and right views is obtained based on the matching results;

Each frame of stereoscopic image data corresponds to a pixel coordinate mapping model between left and right views, and an average pixel coordinate mapping model is obtained based on the pixel coordinate mapping model between left and right views corresponding to multiple frames of stereoscopic image data;

2. The method according to claim 1, wherein the feature values are respectively extracted and matched for the left view and right view of each frame of stereoscopic image data, specifically including:

Extract SIFT features for the left view and the right view respectively, and obtain the SIFT feature descriptors corresponding to the left view and the right view respectively;

Match the SIFT feature descriptors corresponding to the left and right views respectively to obtain matching point pairs of SIFT feature points for the left and right views.

3. The method according to claim 2, wherein the pixel coordinate mapping model between the left and right views is obtained according to the matching result, specifically including:

The pixel coordinate mapping model between the left and right views is obtained using the matching point pair as an input parameter.

4. The method according to claim 3, wherein the matching point pair is used as an input parameter to obtain the pixel coordinate mapping model between the left and right views, specifically including:

Randomly select a data point set from the set S composed of the matching point pairs for initialization; filter out the matching support point set Si from the data point set according to the preset threshold, as consistent with the size of Si and the preset threshold Comparison to continuously select new data samples and estimate the approximate The pixel coordinate mapping model between views is used until the largest consistent set is obtained, and the required pixel coordinate mapping model between the left and right views is obtained for the final data sample based on the largest consistent set.

5. The method according to any one of claims 1 to 4, wherein the average pixel coordinate mapping model is obtained based on the pixel coordinate mapping model between the left and right views corresponding to the multi-frame stereo image data, specifically including:

Obtain the pixel coordinate mapping model between the left and right views corresponding to the latest specified number of multi-frame stereo image data collected before stereoscopic image synthesis, and obtain the pixel coordinate mapping model between the left and right views corresponding to the multi-frame stereo image data. On average, the average pixel coordinate mapping model is obtained.

6. The method according to claim 5, wherein the image processing based on the average pixel coordinate mapping model specifically includes:

Perform repair processing of damaged areas based on the average pixel coordinate mapping model;

7. The method according to claim 6, wherein the repair processing of the damaged area based on the average pixel coordinate mapping model specifically includes:

The damaged area is detected, the coordinate information of the damaged area in another normal view is determined based on the average pixel coordinate mapping model, the image content of the current damaged area is replaced with the image content of the corresponding area in the normal view, and the edge of the damaged area is detected , corrected by the average of the gray value of the corresponding pixels in the left and right views.

8. The method according to claim 6, wherein the noise reduction processing based on the average pixel coordinate mapping model specifically includes:

When a suspicious noise point is detected, the location area of the suspicious noise point in another view is determined based on the average pixel coordinate mapping model, and a gray scale comparison is performed to determine whether the suspicious noise point is a noise point; the determined noise point is determined according to the predetermined location area The gray value of each pixel in the neighborhood is corrected with the gray value of the corresponding pixel in another view.

9. A data processing device, the device includes: The preprocessing unit is configured to, when preprocessing each frame of stereoscopic image data collected, extract feature values from the left view and right view of each frame of stereoscopic image data respectively and perform matching, and obtain the pixels between the left and right views based on the matching results. Coordinate mapping model;

An image processing unit configured to perform image processing based on the average pixel coordinate mapping model; each frame of stereoscopic image data corresponds to a pixel coordinate mapping model between left and right views, based on the pixel coordinates between left and right views corresponding to multiple frames of stereoscopic image data. The mapping model obtains the average pixel coordinate mapping model.

10. The device according to claim 9, wherein the preprocessing unit further includes: a feature matching subunit;

The feature matching subunit is configured to extract SIFT features for the left view and the right view respectively, and obtain SIFT feature descriptors corresponding to the left view and the right view respectively; and perform SIFT features on the SIFT feature descriptors corresponding to the left view and the right view respectively. Point matching obtains matching point pairs of SIFT feature points of the left and right views.

11. The device according to claim 10, wherein the preprocessing unit further includes: a model estimation subunit;

The model estimation subunit is configured to obtain the pixel coordinate mapping model between the left and right views using the matching point pair as an input parameter.

12. The device according to claim 11, wherein the model estimation subunit is configured to randomly select a data point set from the set S composed of the matching point pairs for initialization; from the data point set according to The preset threshold filters out the matching support point set Si as a consistent set; based on the comparison of the size of Si with the preset threshold, new data samples are continuously selected and the pixel coordinate mapping model between the left and right views is estimated until the maximum consistency is obtained. set, and obtain the required pixel coordinate mapping model between the left and right views for the final data sample based on the largest consistent set.

13. The device according to any one of claims 9 to 12, wherein the image processing unit further includes: a model mean acquisition subunit; The model mean acquisition subunit is configured to obtain a pixel coordinate mapping model between the left and right views corresponding to the specified number of multi-frame stereo image data latest collected before stereoscopic image synthesis, and the multi-frame stereo image data corresponding to The pixel coordinate mapping models between the left and right views are averaged to obtain the average pixel coordinate mapping model.

14. The device according to claim 13, wherein the image processing unit further includes: a first processing subunit and a second processing subunit;

The first processing subunit is configured to perform repair processing of the damaged area based on the average pixel coordinate mapping model;

The second processing subunit is configured to perform noise reduction processing based on the average pixel coordinate mapping model.

15. The device according to claim 14, wherein the first processing subunit is further configured to detect the damaged area and determine the coordinate information of the damaged area in another normal view based on the average pixel coordinate mapping model, Replace the image content of the current damaged area with the image content of the corresponding area in the normal view. For the edge of the detected damaged area, use the average of the gray value of the corresponding pixels in the left and right views to correct it.

16. The device according to claim 14, wherein the second processing subunit is further configured to detect a suspicious noise point and determine the location area of the suspicious noise point in another view based on the average pixel coordinate mapping model. , perform grayscale comparison to determine whether the suspicious noise point is a noise point; correct the determined noise point according to the grayscale value of each pixel in the neighborhood of the predetermined position area with the grayscale value of the corresponding pixel point in another view.

17. A computer storage medium, the computer storage medium includes a set of instructions that, when executed, cause at least one processor to execute the data processing method according to any one of claims 1 to 8.

18. A user terminal, the user terminal comprising the data processing device according to any one of claims 9 to 16.