CN113436245A

CN113436245A - Image processing method, model training method, related device and electronic equipment

Info

Publication number: CN113436245A
Application number: CN202110985348.0A
Authority: CN
Inventors: 田照银; 莫苏苏; 吴昊; 王抒昂
Original assignee: Wuhan Silicon Integrated Co Ltd
Current assignee: Wuhan Silicon Integrated Co Ltd
Priority date: 2021-08-26
Filing date: 2021-08-26
Publication date: 2021-09-24
Anticipated expiration: 2041-08-26
Also published as: CN113436245B

Abstract

The embodiment of the invention discloses an image processing method, a model training method, a related device and electronic equipment. The image processing method comprises the following steps: obtaining depth image data; performing depth perception feature extraction on the depth image data through a depth perception model which is trained in advance to obtain feature image data, and performing feature fusion on the feature image data to obtain target depth image data; the target depth image data is depth image data obtained by performing depth correction on the depth image data.

Description

Image processing method, model training method, related device and electronic equipment

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image processing method, a model training method, a related apparatus, and an electronic device.

Background

Ideal Indirect measurement Time Of flight (iToF) imaging assumes that the transmitted signal undergoes only one reflection in the scene. However, in the actual imaging process, the light rays may be reflected and refracted many times in the scene, which means that the actual received signal may contain several sub-signals. When multipath interference exists in a scene, an error exists between a Time Of flight (ToF) measurement depth map and an actual depth map in a multipath interference area. Due to the existence of the error, the measurement value of the ToF camera cannot reflect the actual scene depth, the measurement effect of the ToF camera is seriously influenced, and the traditional method cannot solve the problem of multipath interference of the ToF depth map.

Disclosure of Invention

In order to solve the existing technical problems, embodiments of the present invention provide an image processing method, a model training method, a related apparatus, and an electronic device.

In order to achieve the above purpose, the technical solution of the embodiment of the present invention is realized as follows:

in a first aspect, an embodiment of the present invention provides an image processing method, where the method includes:

obtaining depth image data;

performing depth perception feature extraction on the depth image data through a depth perception model which is trained in advance to obtain feature image data, and performing feature fusion on the feature image data to obtain target depth image data; the target depth image data is depth image data obtained by performing depth correction on the depth image data.

In the above scheme, the depth perception model at least includes a feature extraction part, the feature extraction part includes at least one rearranged feature extraction structure, the rearranged feature extraction structure is used to rearrange and combine pixel data of input feature image data, and perform depth perception feature extraction on the recombined feature image data; wherein the size of the recombined feature image data is smaller than the size of the input feature image data, and the recombined feature image data includes all pixel data of the input feature image data.

In the above scheme, the rearrangement feature extraction structure is configured to extract pixel point data from input feature image data starting from different initial pixel points and every preset pixel point distance to obtain a plurality of feature sub-image data, and combine the plurality of feature sub-image data to obtain output feature image data.

In the above scheme, the feature extraction part further comprises a plurality of feature extraction structures;

the depth perception feature extraction of the depth image data by the depth perception model completed through pre-training comprises the following steps:

performing depth perception feature extraction on the depth image through a first feature extraction structure to obtain first feature image data;

rearranging and combining pixel point data of the first characteristic image data through a first rearranged characteristic extraction structure to obtain rearranged and combined second characteristic image data, and performing depth perception characteristic extraction on the second characteristic image data to obtain third characteristic image data; the image size of the second characteristic image data is smaller than that of the first characteristic image, and the second characteristic image data comprises all pixel point data of the first characteristic image data;

rearranging and combining pixel point data of the third characteristic image data through a second rearranged characteristic extraction structure to obtain rearranged and combined fourth characteristic image data, and performing depth perception characteristic extraction on the fourth characteristic image data to obtain fifth characteristic image data; the image size of the fifth feature image data is smaller than the image size of the third feature image, and the fifth feature image data includes all pixel point data of the third feature image data.

In the above scheme, the method further comprises: and performing downsampling processing on the fifth feature image data through a second feature extraction structure, and performing depth perception feature extraction on the downsampled fifth feature image data to obtain sixth feature image data.

In the above solution, the depth perception model further includes a feature fusion part, where the feature fusion part includes a plurality of fusion structures;

the performing feature fusion on the feature image data to obtain target depth image data includes:

and sequentially carrying out up-sampling processing on the input feature image data through each fusion structure, and carrying out feature fusion processing on the feature image data subjected to the up-sampling processing and the feature image data subjected to depth perception feature extraction by the feature extraction part to obtain target depth image data.

In a second aspect, an embodiment of the present invention further provides a model training method, where the method includes:

obtaining first depth image data and second depth image data, wherein the second depth image data is real image data corresponding to the first depth image data;

performing depth perception feature extraction on the first depth image data through a depth perception model to obtain feature image data, and performing feature fusion on the feature image data to obtain third depth image data;

determining a first error between the third depth image data and the second depth image data, adjusting parameters of the depth perception model based on the first error.

In the above scheme, the method further comprises:

obtaining fourth depth image data and fifth depth image data; the fifth depth image data is real depth image data corresponding to the fourth depth image data;

obtaining sixth depth image data based on the fourth depth image data and the depth perception model, determining a second error between the sixth depth image data and the fifth depth image data;

obtaining a first error curve based on the first errors corresponding to a plurality of first depth image data, and obtaining a second error curve based on the second errors corresponding to a plurality of fourth depth image data;

determining a degree of similarity of the first error curve and the second error curve;

and under the condition that the similarity degree of the first error curve and the second error curve does not meet the preset requirement, adjusting the parameters of the depth perception model.

In a third aspect, an embodiment of the present invention further provides an image processing apparatus, where the apparatus includes:

the first acquisition module is used for acquiring depth image data;

the first processing module is used for carrying out depth perception feature extraction on the depth image data through a depth perception model which is trained in advance to obtain feature image data, and carrying out feature fusion on the feature image data to obtain target depth image data; the target depth image data is depth image data obtained by performing depth correction on the depth image data.

In a fourth aspect, an embodiment of the present invention further provides a model training apparatus, where the apparatus includes:

the second acquisition module is used for acquiring first depth image data and second depth image data, wherein the second depth image data is real image data corresponding to the first depth image data;

the second processing module is used for carrying out depth perception feature extraction on the first depth image data through a depth perception model to obtain feature image data, and carrying out feature fusion on the feature image data to obtain third depth image data;

and an adjustment module to determine a first error between the third depth image data and the second depth image data, to adjust parameters of the depth perception model based on the first error.

In a fifth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the image processing method according to the first aspect or the model training method according to the second aspect.

In a sixth aspect, an embodiment of the present invention further provides an electronic device, including: a processor and a memory for storing a computer program operable on the processor, wherein the processor is configured to perform the steps of the image processing method of the first aspect or the model training method of the second aspect when the computer program is executed.

According to the image processing method, the model training method, the related device and the electronic equipment, the depth perception characteristic extraction is carried out on the depth image data through the depth perception model to obtain the characteristic image data, the characteristic image data is subjected to characteristic fusion to obtain the target depth image data after the depth correction, and the problem that the multipath interference cannot be effectively repaired by the traditional method at present is solved. The embodiment of the invention applies the deep learning method based on semantic segmentation to the field of ToF-based deep image processing, and achieves the aim of repairing multipath interference by using deep perception information.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a flowchart illustrating an image processing method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of image rearrangement according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart of a model training method according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a model structure according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of an exemplary embodiment of an image processing apparatus;

FIG. 6 is a schematic diagram of a structure of a model training apparatus according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a hardware component structure of the electronic device according to the embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

The present embodiment provides an image processing method, and fig. 1 is a schematic flow chart of the image processing method according to the embodiment of the present invention; as shown in fig. 1, the method includes:

step 101: obtaining depth image data;

step 102: performing depth perception feature extraction on the depth image data through a depth perception model which is trained in advance to obtain feature image data, and performing feature fusion on the feature image data to obtain target depth image data; the target depth image data is depth image data obtained by performing depth correction on the depth image data.

The image processing method of the embodiment is applied to an image processing device; the image processing apparatus may be located in any electronic device having image processing capabilities. In some examples, the electronic device may be a computer, a cell phone, a Virtual Reality (VR) device, an Augmented Reality (AR) device, or like user device; in other examples, the electronic device may be a server or the like. In the embodiments of the present application, an electronic device is taken as an example for description.

The depth image in this embodiment may be acquired by an image acquisition device built in the electronic device or externally connected to the electronic device, and the image acquisition device may specifically be a depth image acquisition device. The electronic device can also obtain a plurality of frames of depth images transmitted by other electronic devices through the communication component, and the depth image data is collected by the image collecting device built in or externally connected with other electronic devices.

In some alternative embodiments, the depth image data may include two-dimensional image data and depth data; the two-dimensional image data represents a planar image of the acquired target scene; alternatively, the two-dimensional image may be a red, green, blue (RGB) image, and may also be a grayscale image. The depth data represents the distance between the image acquisition device and each object in the acquired target scene. In further alternative embodiments, the depth image data may also comprise only depth data, which represents the distance between the image acquisition device and each object in the acquired target scene. In the present embodiment, the depth data is mainly processed, and in the following embodiments, the depth image data may correspond to the depth data, unless otherwise specified.

In this embodiment, the depth perception model is a neural network model trained in advance. Illustratively, the depth perception model meeting the preset requirement can be obtained by training the depth perception model of the sample image data. Wherein, the depth perception model can comprise a characteristic extraction part and a characteristic fusion part.

In some optional embodiments, before the depth image data is processed by the depth perception model, the depth image data may be preprocessed, specifically, the depth image data may be adjusted to a uniform specification, and the adjustment method mainly includes at least one of zooming in, zooming out, and cropping, and may be adjusted according to an actual situation. And then, performing feature extraction on the preprocessed depth image data through a depth perception model to obtain feature image data, and performing feature fusion on the extracted feature image data to obtain target depth image data.

By adopting the technical scheme of the embodiment of the invention, the depth perception model which is pre-trained is utilized to carry out depth perception feature extraction on the obtained depth image data to obtain the feature image data, and the feature image data is subjected to feature fusion to obtain the depth image data of the target after depth correction, thereby solving the problem of depth map multipath interference caused by multiple reflections and/or refractions of signals in a scene.

In some optional embodiments of the present invention, the depth perception model at least includes a feature extraction part, the feature extraction part includes at least one rearranged feature extraction structure, the rearranged feature extraction structure is configured to perform rearrangement and combination processing of pixel data on input feature image data, and perform depth perception feature extraction processing on the recombined feature image data; wherein the size of the recombined feature image data is smaller than the size of the input feature image data, and the recombined feature image data includes all pixel data of the input feature image data.

In this embodiment, the depth perception model performs rearrangement feature extraction through at least one rearrangement feature extraction structure in the process of performing feature extraction on the depth image data, that is, performs rearrangement, combination and feature extraction processing on pixel point data of the feature image data to be processed. The size of the recombined feature image data is smaller than that of the input feature image data, the recombined feature image data comprises all pixel data of the input feature image data, namely the image size of the feature image data is reduced by rearranging and combining pixel data of the feature image data to be processed, the pixel data in the original feature image data can be lost by distinguishing from downsampling processing, and all pixel data in the original feature image data can be maintained by adopting rearranging and combining processing.

Illustratively, feature image data to be processed with a data specification of 256 × 1 (where, a × b × c represents the length of the image data, b represents the width of the image data, and c represents the number of channels) is rearranged and combined with feature image data with a data specification of 128 × 4, and the size of the feature image data is reduced to 128 × 128 from 256 × 256 before rearrangement; however, the data after rearrangement and combination has 4 channels, each having 128 × 128 data, and thus the amount of data is not reduced compared to the data before rearrangement. Further, the depth-aware feature extraction process is performed on the re-combined feature image data, and for example, the depth-aware feature extraction process may be performed on the re-combined feature image data by using a convolution kernel of 3 × 64.

In some optional embodiments of the present invention, the rearranged feature extraction structure is configured to extract pixel point data from the input feature image data starting from different initial pixel points and every preset pixel point distance to obtain a plurality of feature sub-image data, and combine the plurality of feature sub-image data to obtain the output feature image data.

For example, as shown in fig. 2, the size of the feature image data is 4 × 4, the initial pixel points may be a first pixel point 11 in a first row, a second pixel point 12 in the first row, a first pixel point 21 in a second row, and a second pixel point 22 in the second row in the feature image data, and the pixel point data is extracted from the feature image data every distance of 1 pixel point from the pixel points 11, 12, 21, and 22 serving as the initial pixel points, so as to obtain 4 feature sub-image data shown in fig. 2, and the 4 feature sub-image data are combined, so that the rearranged feature image data of 2 × 4 is obtained.

By adopting the technical scheme of the embodiment of the invention, the feature image data is rearranged through the rearrangement feature extraction structure, the current pixel point and the non-adjacent pixel point are recombined to obtain a plurality of feature sub-image data, and the plurality of feature sub-image data are combined to obtain the output feature image data. By rearranging the pixels, all original information of the characteristic image data is reserved, and the depth values of the surrounding pixels can correct the depth value of the current pixel, so that the multipath interference is eliminated, and the repair effect of the multipath interference is greatly improved.

In some optional embodiments of the invention, the feature extraction portion further comprises a plurality of feature extraction structures; the depth perception feature extraction of the depth image data by the depth perception model completed through pre-training comprises the following steps: performing depth perception feature extraction on the depth image through a first feature extraction structure to obtain first feature image data; rearranging and combining pixel point data of the first characteristic image data through a first rearranged characteristic extraction structure to obtain rearranged and combined second characteristic image data, and performing depth perception characteristic extraction on the second characteristic image data to obtain third characteristic image data; the image size of the second characteristic image data is smaller than that of the first characteristic image, and the second characteristic image data comprises all pixel point data of the first characteristic image data; rearranging and combining pixel point data of the third characteristic image data through a second rearranged characteristic extraction structure to obtain rearranged and combined fourth characteristic image data, and performing depth perception characteristic extraction on the fourth characteristic image data to obtain fifth characteristic image data; the image size of the fifth feature image data is smaller than the image size of the third feature image, and the fifth feature image data includes all pixel point data of the third feature image data.

In the embodiment, for the acquired depth image data, feature extraction is performed on the depth image data through a feature extraction part in a trained depth perception model; for example, the feature extraction part in this embodiment includes a plurality of feature extraction structures and at least one rearranged feature extraction structure, and each feature extraction structure may include a plurality of processing layers, for example, the feature extraction structure may include a convolutional layer, or a downsampling layer and a convolutional layer, and so on. Of course, the composition form of the feature extraction structure is not limited to the above example, and may be adjusted according to an actual scene.

In this embodiment, the first feature extraction structure may include at least one convolution layer, and the size of the convolution kernel may be set according to actual needs. For example, the depth image data may be convolved by using a convolution kernel of 3 × 64, and the first-level depth perception feature may be extracted to obtain the convolved first feature image data. "3 x 3" represents the single channel convolution kernel specification, "64" represents the number of convolution kernel channels, and the convolution kernel specification can follow the same rule in the subsequent steps. Optionally, in order to avoid that the data specification is continuously reduced due to multiple convolutions, 0 is supplemented to the periphery of the data before convolution, and single-channel data specifications before convolution and after convolution are equal.

The first rearrangement structure is the same as the rearrangement structure in the above embodiment, and is not repeated here, and the pixel point data of the first feature image data is rearranged and combined by the first rearrangement feature extraction structure. Illustratively, the first feature image data having a data structure of 256 × 64 is rearranged to the second feature image data having a data structure of 128 × 256. Illustratively, the image size of the second feature image data is 128 × 128, the image size of the first feature image data is 256 × 256, the image size of the second feature image data is smaller than the image size of the first feature image data, and the second feature image data includes all pixel data of the first feature image data.

Further, the convolution kernel of 3 × 64 may be used to perform two convolution operations on the second feature image data with the data structure of 128 × 256, so as to complete depth perception feature extraction on the second feature image data, and obtain third feature image data. For example, the data structure of the third feature image data may be 128 × 128.

The second rearrangement structure is the same as the rearrangement structure in the above embodiment, and is not repeated here, the pixel point data of the third feature image data is rearranged and combined by the second rearrangement feature extraction structure, and the third feature image data with the data structure of 128 × 128 is rearranged into the fourth feature image data with the data structure of 64 × 512.

Further, the convolution operation of 3 × 64 may be used to perform two convolution operations on the fourth feature image data with the data structure of 64 × 512, so as to complete depth perception feature extraction on the fourth feature image data, and obtain fifth feature image data. For example, the data structure of the fifth feature image data may be 64 × 256. The image size of the fifth feature image data is 64 × 64, the size of the third feature image data is 128 × 128, the image size of the fifth feature image data is smaller than the image size of the third feature image, and the fifth feature image data includes all pixel data of the third feature image data.

In some optional embodiments of the invention, the method further comprises: and performing downsampling processing on the fifth feature image data through a second feature extraction structure, and performing depth perception feature extraction on the downsampled fifth feature image data to obtain sixth feature image data.

In this embodiment, the second feature extraction structure may include a convolution layer and a down-sampling layer, and the convolution layer and the down-sampling layer perform down-sampling processing on the fifth feature image data in a combined manner, and the down-sampling processing may be, for example, image reduction processing on a feature map including the feature image data, that is, size reduction of the feature map, which results in that the number of pixels in the feature image data after the down-sampling processing is less than the number of pixels in the fifth feature image, and a certain proportion of feature parameters may be randomly discarded in a processing process, so as to prevent over-fitting of the model. The data structures in the embodiments of the present invention are all used as examples, and the data structures in the specific implementation process may be adjusted according to actual scenes. Illustratively, the fifth feature image data with the data structure of 64 × 256 is subjected to down-sampling processing by the second feature extraction structure, and the depth perception feature extraction is performed on the down-sampled fifth feature image data by convolution processing, so as to obtain the sixth feature image data with the data structure of 16 × 512.

In some optional embodiments of the invention, the depth perception model further comprises a feature fusion part, the feature fusion part comprising a plurality of fusion structures; the performing feature fusion on the feature image data to obtain target depth image data includes: and sequentially carrying out up-sampling processing on the input feature image data through each fusion structure, and carrying out feature fusion processing on the feature image data subjected to the up-sampling processing and the feature image data subjected to depth perception feature extraction by the feature extraction part to obtain target depth image data.

In this embodiment, the feature fusion processing includes splicing the feature image data obtained by the feature extraction part in the depth perception model with the feature image data of the same data specification after the upsampling processing. The feature fusion part comprises a plurality of fusion structures, each fusion structure performs upsampling processing on feature image data, the upsampling processing mainly aims at amplifying the image size, and the upsampling can select a proper interpolation algorithm for image amplification according to different requirements.

Illustratively, the feature image data processed by the feature extraction part is sequentially subjected to upsampling processing on the feature image data through each fusion structure, and the feature image data subjected to the upsampling processing and the feature image data subjected to the depth perception feature extraction of the feature extraction part are subjected to feature fusion processing, so that target depth image data are finally obtained. The data specification of the target depth image data is the same as that of the input depth image data, and the target depth image data is the depth map after the multipath interference is repaired.

Based on the above embodiment, the embodiment of the invention also provides a model training method. FIG. 3 is a schematic flow chart of a model training method according to an embodiment of the present invention; as shown in fig. 3, the method includes:

step 201: obtaining first depth image data and second depth image data, wherein the second depth image data is real image data corresponding to the first depth image data;

step 202: performing depth perception feature extraction on the first depth image data through a depth perception model to obtain feature image data, and performing feature fusion on the feature image data to obtain third depth image data;

step 203: determining a first error between the third depth image data and the second depth image data, adjusting parameters of the depth perception model based on the first error.

In the embodiment of the invention, the first depth image data obtained in advance is used as a training data set. For the training data set, a corresponding real depth map needs to be made for each piece of depth image data, the first depth image data serves as input data, and the made real depth map data (i.e., the second depth image data) serves as a reference target. The data quality of the real depth map data directly influences the quality degree of the trained model, and the smaller the depth error of the real depth map data is, the better the trained model is.

In the embodiment of the invention, the depth perception model is used for carrying out depth perception feature extraction on the first depth image data to obtain feature image data, and the feature image data is subjected to feature fusion to obtain third depth image data. Determining a first error between the third depth image data and the second depth image data, adjusting parameters of the depth perception model based on the first error. In the parameter adjusting process, the electronic equipment actively adjusts the parameters according to the first error and a preset adjusting strategy.

In some optional embodiments of the present invention, the depth perception model at least includes a feature extraction part, and the feature extraction part includes at least one rearranged feature extraction structure, and the rearranged feature extraction structure is configured to perform rearrangement and combination processing of pixel data on input feature image data, and perform depth perception feature extraction processing on the recombined feature image data.

In this embodiment, the depth perception model performs rearrangement feature extraction through at least one rearrangement feature extraction structure in the process of performing feature extraction on the first depth image data, that is, performs rearrangement, combination and feature extraction processing on pixel point data of the feature image data to be processed.

By adopting the technical scheme of the embodiment of the invention, the feature image data is rearranged through the rearrangement feature extraction structure, the current pixel point and the non-adjacent pixel point are recombined to obtain a plurality of feature sub-image data, and the plurality of feature sub-image data are combined to obtain the output feature image data. By rearranging the pixels, not only all original information of the characteristic image data is kept, but also the data characteristics extracted by the neural network are more sufficient and comprehensive, thereby promoting the correction of the depth value of the current pixel so as to eliminate the multipath interference and greatly improving the repairing effect of the multipath interference.

In some optional embodiments of the invention, the feature extraction portion further comprises a plurality of feature extraction structures; the depth perception feature extraction of the first depth image data through a depth perception model comprises the following steps: performing depth perception feature extraction on the depth image through a first feature extraction structure to obtain first feature image data; rearranging and combining pixel point data of the first characteristic image data through a first rearranged characteristic extraction structure to obtain rearranged and combined second characteristic image data, and performing depth perception characteristic extraction on the second characteristic image data to obtain third characteristic image data; and rearranging and combining the pixel point data of the third characteristic image data through a second rearranged feature extraction structure to obtain rearranged and combined fourth characteristic image data, and performing depth perception feature extraction on the fourth characteristic image data to obtain fifth characteristic image data.

In some optional embodiments of the invention, the depth perception model further comprises a feature fusion part, the feature fusion part comprising a plurality of fusion structures; the performing feature fusion on the feature image data to obtain target depth image data includes: and sequentially carrying out up-sampling processing on the input feature image data through each fusion structure, and carrying out feature fusion processing on the feature image data subjected to the up-sampling processing and the feature image data subjected to depth perception feature extraction by the feature extraction part to obtain third depth image data.

In this embodiment, the process of processing the first depth image data through the depth perception model may specifically refer to a specific implementation process of the depth image data through the depth perception model in the image processing method according to the foregoing embodiment, and details are not repeated here.

In some optional embodiments of the invention, the method further comprises: obtaining fourth depth image data and fifth depth image data; the fifth depth image data is real depth image data corresponding to the fourth depth image data; obtaining sixth depth image data based on the fourth depth image data and the depth perception model, determining a second error between the sixth depth image data and the fifth depth image data; obtaining a first error curve based on the first errors corresponding to a plurality of first depth image data, and obtaining a second error curve based on the second errors corresponding to a plurality of fourth depth image data; determining a degree of similarity of the first error curve and the second error curve; and under the condition that the similarity degree of the first error curve and the second error curve does not meet the preset requirement, adjusting the parameters of the depth perception model.

In this embodiment, the fourth depth image data is used as verification data, the fifth depth image data is real depth image data corresponding to the fourth depth image data, and the sixth depth image data is data after the fourth depth image data is trained. And performing depth perception feature extraction on the fourth depth image data through a depth perception model to obtain feature image data, and performing feature fusion on the feature image data to obtain sixth depth image data. Determining a second error between the fifth depth image data and the sixth depth image data.

Outputting an error curve by taking the number of the depth image data which are trained simultaneously as a unit, and obtaining a first error curve based on first errors corresponding to a plurality of first depth images which are trained simultaneously, wherein the first error curve is an error curve of a training data set; obtaining a second error curve based on the second errors corresponding to the plurality of fourth depth image data, wherein the second error curve is an error of the verification data set; and meanwhile, observing whether the error curves of the training data set and the verification data set are similar or not, so that overfitting of the model caused by over-training is prevented. The appearance of model overfitting is that the effect on the training data set is good, and the effect on the verification data set is far less than that of the training data set. And if the similarity degree of the error curves of the training data set and the verification data set does not meet the preset requirement, adjusting the parameters of the depth perception model. Adjusting the parameters of the depth perception model in this embodiment requires a person skilled in the art to manually adjust the model parameters until the desired effect is achieved.

In some optional embodiments of this embodiment, the method further comprises: and terminating the training of the depth perception model when the training times of the depth perception model reach a preset training time or the error reaches a preset error target value.

In this embodiment, after training is completed, the depth perception model is applied to the test data set and errors are observed. If the error of the test data set is similar to the error of the training data set or slightly higher than the error of the training data set, the model is determined to be a normal model and can be used for repairing the multipath interference of the actual depth map.

The method of the embodiments of the present invention is described below with reference to a specific example.

Illustratively, in an actual training process, if input depth map data is denoted as x (i, j), a defined training model is denoted as f, output depth map data is denoted as f (x (i, j)), real depth map data is denoted as y, and a depth perception error is x (i, j)

Where n is the number of depth map data, i is the number of image rows, and j is the number of image columns. When in an ideal scene, defining the depth perception error as err to be 0; when multipath interference exists in the scene, the depth perception error err is a large value. When the learning rate is defined as 0.001, the error value is decreased by 0.001 × err in a single time. By back-propagating (i.e., propagating 0.001 × err layer by layer to the first layer by the chain derivation method), each layer is prompted to modify the weights, and a single training forms a closed loop. After multiple times of deep perception network training, the depth perception error err converges to a small number, namely the error between the input depth map data and the real depth map data is small, and most of multipath interference is repaired.

FIG. 4 is a schematic diagram of a model structure according to an embodiment of the present invention; the model shown in fig. 4 includes 29 processing layers, wherein the first feature extraction structure in the foregoing embodiment may correspond to the 1 st to 3 rd processing layers in the model, the first rearranged feature extraction structure may correspond to the 4 th to 6 th processing layers in the model, the second rearranged feature extraction structure corresponds to the 7 th to 9 th processing layers in the model, the second feature extraction structure corresponds to the 10 th to 12 th processing layers in the model, the 13 th to 15 th steps in the model may be a third feature extraction structure (not shown in the foregoing embodiment) after the second feature extraction structure, and the feature fusion structure corresponds to the 16 th to 29 th processing layers in the model; wherein, the 16 th to 18 th process layers, the 19 th to 21 st process layers, and the 22 nd to 24 th process layers may respectively correspond to the fusion structure in the model.

Based on the above embodiment, the embodiment of the invention also provides an image processing device. FIG. 5 is a schematic diagram of the structure of an image processing apparatus according to an embodiment of the present invention; as shown in fig. 5, the apparatus includes:

a first obtaining module 31, configured to obtain depth image data;

the first processing module 32 is configured to perform depth perception feature extraction on the depth image data through a depth perception model completed through pre-training to obtain feature image data, and perform feature fusion on the feature image data to obtain target depth image data; the target depth image data is depth image data obtained by performing depth correction on the depth image data.

In some optional embodiments of the invention, the feature extraction portion further comprises a plurality of feature extraction structures; the first processing module 32 is configured to perform depth perception feature extraction on the depth image through a first feature extraction structure to obtain first feature image data; rearranging and combining pixel point data of the first characteristic image data through a first rearranged characteristic extraction structure to obtain rearranged and combined second characteristic image data, and performing depth perception characteristic extraction on the second characteristic image data to obtain third characteristic image data; the image size of the second characteristic image data is smaller than that of the first characteristic image, and the second characteristic image data comprises all pixel point data of the first characteristic image data; rearranging and combining pixel point data of the third characteristic image data through a second rearranged characteristic extraction structure to obtain rearranged and combined fourth characteristic image data, and performing depth perception characteristic extraction on the fourth characteristic image data to obtain fifth characteristic image data; the image size of the fifth feature image data is smaller than the image size of the third feature image, and the fifth feature image data includes all pixel point data of the third feature image data.

In some optional embodiments of the present invention, the first processing module 32 in the foregoing apparatus is further configured to perform downsampling processing on the fifth feature image data through the second feature extraction structure, and perform depth-aware feature extraction on the downsampled fifth feature image data to obtain sixth feature image data.

In some optional embodiments of the invention, the depth perception model further comprises a feature fusion part, the feature fusion part comprising a plurality of fusion structures; the first processing module 32 is configured to perform upsampling processing on the input feature image data sequentially through each fusion structure, and perform feature fusion processing on the feature image data subjected to upsampling processing and the feature image data subjected to depth perception feature extraction by the feature extraction part to obtain target depth image data.

In the embodiment of the present invention, the first obtaining module 31 and the first Processing module 32 in the apparatus may be implemented by a Central Processing Unit (CPU), a Digital Signal Processor (DSP), a Micro Control Unit (MCU), or a Programmable Gate Array (FPGA) in practical application.

It should be noted that: the image processing apparatus provided in the above embodiment is exemplified by the division of each program module when performing image processing, and in practical applications, the processing may be distributed to different program modules according to needs, that is, the internal structure of the apparatus may be divided into different program modules to complete all or part of the processing described above. In addition, the image processing apparatus and the image processing method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments in detail and are not described herein again.

The embodiment of the invention also provides a model training device, and fig. 6 is a schematic diagram of a composition structure of the model training device in the embodiment of the invention; as shown in fig. 6, the apparatus includes:

a second obtaining module 41, configured to obtain first depth image data and second depth image data, where the second depth image data is real image data corresponding to the first depth image data;

the second processing module 42 is configured to perform depth perception feature extraction on the first depth image data through a depth perception model to obtain feature image data, and perform feature fusion on the feature image data to obtain third depth image data;

and a first adjusting module 43 for determining a first error between the third depth image data and the second depth image data, adjusting parameters of the depth perception model based on the first error.

In some optional embodiments of the invention, the apparatus further comprises a second adjusting module;

the second obtaining module 41 is further configured to obtain fourth depth image data and fifth depth image data; the fifth depth image data is real depth image data corresponding to the fourth depth image data;

the second processing module 42 is further configured to obtain sixth depth image data based on the fourth depth image data and the depth perception model, and determine a second error between the sixth depth image data and the fifth depth image data; obtaining a first error curve based on the first errors corresponding to a plurality of first depth image data, and obtaining a second error curve based on the second errors corresponding to a plurality of fourth depth image data;

the second adjusting module is configured to adjust a parameter of the depth perception model when the similarity degree between the first error curve and the second error curve does not meet a preset requirement.

In the embodiment of the present invention, the second obtaining module 41, the second processing module 42, the first adjusting module 43, and the second adjusting module in the apparatus can be implemented by a CPU, a DSP, an MCU, or an FPGA in practical application.

It should be noted that: in the model training apparatus provided in the above embodiment, only the division of the program modules is exemplified when performing model training, and in practical applications, the processing may be distributed to different program modules according to needs, that is, the internal structure of the apparatus may be divided into different program modules to complete all or part of the processing described above. In addition, the model training device and the model training method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments in detail and are not described herein again.

The embodiment of the invention also provides the electronic equipment. Fig. 7 is a schematic diagram of a hardware composition structure of an electronic device according to an embodiment of the present invention, as shown in fig. 7, the electronic device includes a processor 51 and a memory 52 for storing a computer program capable of running on the processor 51, and the processor 51 is configured to implement the steps of the embodiment of the present invention applied to an image processing method or a model training method when running the computer program.

Optionally, the various components in the electronic device are coupled together by a bus system 53. It will be appreciated that the bus system 53 is used to enable communications among the components. The bus system 53 includes a power bus, a control bus, and a status signal bus in addition to the data bus. For clarity of illustration, however, the various buses are labeled as bus system 53 in fig. 7.

It will be appreciated that the memory 52 can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. Among them, the nonvolatile Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a magnetic random access Memory (FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical disk, or a Compact Disc Read-Only Memory (CD-ROM); the magnetic surface storage may be disk storage or tape storage. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Synchronous Static Random Access Memory (SSRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), Enhanced Synchronous Dynamic Random Access Memory (ESDRAM), Enhanced Synchronous Dynamic Random Access Memory (Enhanced DRAM), Synchronous Dynamic Random Access Memory (SLDRAM), Direct Memory (DRmb Access), and Random Access Memory (DRAM). The memory 52 described in connection with the embodiments of the invention is intended to comprise, without being limited to, these and any other suitable types of memory.

The method disclosed in the above embodiments of the present invention may be applied to the processor 51, or implemented by the processor 51. The processor 51 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 51. The processor 51 described above may be a general purpose processor, a DSP, or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The processor 51 may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present invention. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed by the embodiment of the invention can be directly implemented by a hardware decoding processor, or can be implemented by combining hardware and software modules in the decoding processor. The software modules may be located in a storage medium located in the memory 52, and the processor 51 reads the information in the memory 52 and performs the steps of the image processing method or the model training method in conjunction with its hardware.

In an exemplary embodiment, the network Device may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), FPGAs, general purpose processors, controllers, Micro Controllers (MCUs), microprocessors (microprocessors), or other electronic components for performing the aforementioned methods.

In an exemplary embodiment, the embodiment of the present invention further provides a computer readable storage medium, such as a memory 52, comprising a computer program, which is executable by the processor 51 of the network device to perform the steps of the aforementioned image processing method and model training method. The computer readable storage medium can be Memory such as FRAM, ROM, PROM, EPROM, EEPROM, Flash Memory, magnetic surface Memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

The embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the image processing method or the model training method according to the embodiment of the present invention.

The methods disclosed in the several method embodiments provided in the present application may be combined arbitrarily without conflict to obtain new method embodiments.

Features disclosed in several of the product embodiments provided in the present application may be combined in any combination to yield new product embodiments without conflict.

The features disclosed in the several method or apparatus embodiments provided in the present application may be combined arbitrarily, without conflict, to arrive at new method embodiments or apparatus embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. An image processing method, characterized in that the method comprises:

obtaining depth image data;

2. The method according to claim 1, wherein the depth perception model comprises at least a feature extraction part, the feature extraction part comprises at least one rearranged feature extraction structure, the rearranged feature extraction structure is used for rearranging and combining pixel point data of input feature image data, and performing depth perception feature extraction processing on the recombined feature image data; wherein the size of the recombined feature image data is smaller than the size of the input feature image data, and the recombined feature image data includes all pixel data of the input feature image data.

3. The method of claim 2, wherein the rearranged feature extraction structure is configured to extract pixel point data from the input feature image data at intervals of a preset pixel distance from different initial pixel points, respectively, to obtain a plurality of feature sub-image data, and combine the plurality of feature sub-image data to obtain the output feature image data.

4. The method according to any one of claims 1 to 3, wherein the feature extraction section further comprises a plurality of feature extraction structures;

5. The method of claim 4, further comprising:

and performing downsampling processing on the fifth feature image data through a second feature extraction structure, and performing depth perception feature extraction on the downsampled fifth feature image data to obtain sixth feature image data.

6. The method of claim 5, wherein the depth perception model further comprises a feature fusion portion, the feature fusion portion comprising a plurality of fusion structures;

7. A method of model training, the method comprising:

8. The method of claim 7, further comprising:

9. An image processing apparatus, characterized in that the apparatus comprises:

the first acquisition module is used for acquiring depth image data;

10. A model training apparatus, the apparatus comprising:

11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6 or 7 to 8.

12. An electronic device, comprising: a processor and a memory for storing a computer program capable of running on the processor,

wherein the processor is adapted to perform the steps of the method of any one of claims 1 to 6 or any one of claims 7 to 8 when running the computer program.