CN110782412B

CN110782412B - Image processing method and device, processor, electronic device and storage medium

Info

Publication number: CN110782412B
Application number: CN201911033254.2A
Authority: CN
Inventors: 柳一村; 张佳维; 任思捷; 刘建博
Original assignee: Shenzhen Sensetime Technology Co Ltd
Current assignee: Shenzhen Sensetime Technology Co Ltd
Priority date: 2019-10-28
Filing date: 2019-10-28
Publication date: 2022-01-28
Anticipated expiration: 2039-10-28
Also published as: CN110782412A

Abstract

The application discloses an image processing method and device, a processor, electronic equipment and a storage medium. The method comprises the following steps: acquiring a binocular image, wherein the binocular image comprises a first image to be processed and a second image to be processed, and the image quality of the first image to be processed is higher than that of the second image to be processed; obtaining a repaired second image to be processed according to the first image to be processed and the second image to be processed, wherein the image visual angle of the repaired second image to be processed is the same as the image visual angle of the second image to be processed, and the image quality of the repaired second image to be processed is higher than that of the second image to be processed; and obtaining a first parallax image between the first image to be processed and the repaired second image to be processed according to the first image to be processed and the repaired second image to be processed. A corresponding apparatus is also disclosed. To obtain a first parallax image based on the first image to be processed and the second image to be processed.

Description

Image processing method and device, processor, electronic device and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image processing method and apparatus, a processor, an electronic device, and a storage medium.

Background

Binocular stereo vision, which is an important form of machine vision, is a method of acquiring three-dimensional geometric information of an object by acquiring two images (hereinafter, will be referred to as binocular images) of the object to be measured from different positions using an imaging device based on the principle of parallax and calculating a positional deviation between homologous points of the images. At present, binocular stereoscopic vision is widely applied to the fields of smart phones, unmanned planes and robots.

However, the image quality (including image resolution and image signal-to-noise ratio) of the two images of the obtained binocular image is inconsistent due to inconsistency of the imaging devices (e.g., inconsistency of hardware configurations of the two imaging devices). In the traditional binocular stereo vision method, under the condition that the image quality of two images in the binocular images is inconsistent, the precision of the obtained three-dimensional geometric information of the object is low, and further the precision of the obtained parallax image is low.

Disclosure of Invention

The application provides an image processing method and device, a processor, an electronic device and a storage medium, so as to obtain a repaired second image to be processed based on a first image to be processed and a second image to be processed.

In a first aspect, an image processing method is provided, the method comprising:

acquiring a binocular image, wherein the binocular image comprises a first image to be processed and a second image to be processed, and the image quality of the first image to be processed is higher than that of the second image to be processed;

obtaining a repaired second image to be processed according to the first image to be processed and the second image to be processed, wherein the image visual angle of the repaired second image to be processed is the same as the image visual angle of the second image to be processed, and the image quality of the repaired second image to be processed is higher than that of the second image to be processed;

and obtaining a first parallax image between the first image to be processed and the repaired second image to be processed according to the first image to be processed and the repaired second image to be processed.

In this aspect, the restored second image to be processed is obtained from the first image to be processed and the second image to be processed, and a difference between the image quality of the first image to be processed and the image quality of the restored second image to be processed is made smaller than the image quality of the first image to be processed and the image quality of the second image to be processed. And the precision of the first parallax image obtained according to the first image to be processed and the repaired second image to be processed is higher than that of the parallax image obtained according to the first image to be processed and the second image to be processed.

In a possible implementation manner, the obtaining a repaired second image to be processed according to the first image to be processed and the second image to be processed includes: performing first feature extraction processing on the first image to be processed and the second image to be processed to obtain a horizontal parallax displacement feature image, wherein the horizontal parallax displacement feature image comprises first horizontal parallax displacement between a first pixel point in the first image to be processed and a second pixel point in the second image to be processed, and the first pixel point and the second pixel point are homonymous points;

and performing convolution processing on the first image to be processed by taking the horizontal parallax displacement characteristic image as a convolution kernel to obtain the repaired second image to be processed.

In a manner that may be implemented in this, a horizontal parallax displacement feature image containing horizontal parallax displacement between homologous points in the first image to be processed and the second image to be processed is obtained by performing feature extraction processing on the first image to be processed and the second image to be processed. And determining a convolution kernel for each pixel point in the first image to be processed according to the horizontal parallax displacement information in the horizontal parallax displacement characteristic image, and performing convolution processing on the pixel point in the first image to be processed by using the convolution kernel to adjust the horizontal position of the pixel point in the first image to be processed, so that the difference between the repaired second image to be processed obtained after adjustment and the first image to be processed can be reduced.

In one possible implementation, after the acquiring the binocular images, the method further includes:

performing second feature extraction processing on the first image to be processed and the second image to be processed to obtain a vertical parallax displacement feature image, wherein the vertical parallax displacement feature image comprises vertical parallax displacement between the first pixel point and the second pixel point;

performing convolution processing on the first image to be processed by using the horizontal parallax displacement characteristic image as a convolution kernel to obtain a repaired second image to be processed under the visual angle of the second image to be processed, wherein the convolution processing includes:

and performing convolution processing on the first image to be processed by respectively taking the horizontal parallax displacement characteristic image and the vertical parallax displacement characteristic image as convolution kernels to obtain the repaired second image to be processed.

In this possible implementation manner, by performing feature extraction processing on the first image to be processed and the second image to be processed, a vertical parallax displacement feature image containing vertical parallax displacement information between homologous points in the first image to be processed and the second image to be processed can be obtained. The vertical parallax displacement characteristic image is used for carrying out convolution processing on the first image to be processed, the vertical position of a pixel point in the first image to be processed can be adjusted, and the vertical parallax displacement between the homonymous point in the first image to be processed and the homonymous point in the second image to be processed is reduced.

By combining with the technical scheme in the former possible implementation manner, the horizontal parallax displacement characteristic image and the vertical parallax displacement characteristic image are respectively used as convolution cores to perform convolution processing on the first image to be processed, so that the horizontal positions of the pixel points in the first image to be processed can be adjusted, and meanwhile, the vertical positions of the pixel points in the first image to be processed can be adjusted. The difference between the positions of the pixel points in the repaired second image to be processed and the positions of the pixel points in the second image to be processed is smaller.

In another possible implementation manner, the convolving the first image to be processed with the horizontal parallax displacement feature image as a convolution kernel to obtain a repaired second image to be processed includes:

obtaining a horizontal parallax convolution kernel according to the first horizontal parallax displacement;

and performing convolution processing on the first pixel point by using the horizontal parallax convolution kernel to obtain the repaired second image to be processed.

In this possible implementation manner, the horizontal parallax displacement feature image includes horizontal parallax displacement between the first pixel point and the second pixel point, and when the horizontal parallax displacement feature image is used to perform convolution processing on the first image to be processed, the horizontal position of the first pixel point can be accurately adjusted by using the horizontal parallax displacement between the first pixel point and the second pixel point, so that the accuracy of the obtained repaired second image to be processed is improved.

In another possible implementation manner, the performing a first feature extraction process on the first image to be processed and the second image to be processed to obtain a horizontal parallax displacement feature image includes:

splicing the first image to be processed and the second image to be processed to obtain a third image to be processed; performing n-level coding processing on the third image to be processed to obtain a first intermediate characteristic image, wherein n is a positive integer;

and performing m-level first decoding processing on the first intermediate characteristic image to obtain the horizontal parallax displacement characteristic image, wherein m is a positive integer.

In this possible implementation manner, the feature extraction processing on the first to-be-processed image and the second to-be-processed image is completed by performing n-level encoding processing and m-level first decoding processing on the third to-be-processed image, and a horizontal parallax displacement feature image is obtained. Therefore, the semantic information of the neighborhood of each pixel point in the third image to be processed and the semantic information of the whole third image to be processed can be more accurately extracted.

In another possible implementation manner, the performing m-level first decoding processing on the first intermediate feature image to obtain the horizontal parallax displacement feature image includes:

and fusing the characteristic image output by the ith-level coding processing in the n-level coding processing and the characteristic image output by the jth-level first decoding processing in the m-level first decoding processing to obtain input data of the jth + 1-level first decoding processing in the m-level first decoding processing, wherein i is a positive integer less than or equal to n, and j is a positive integer less than or equal to m-1.

In this possible implementation manner, by fusing the feature image obtained by the encoding process and the feature image obtained by the decoding process, the edge information and the texture information in the feature image obtained by the decoding process can be enriched, and further the edge information and the texture information in the horizontal parallax displacement feature image can be enriched.

performing the m-level first decoding processing on the intermediate characteristic image to obtain a second intermediate characteristic image;

and filtering the second intermediate characteristic image by taking the first image to be processed as a guide image, so that the position of the edge in the second intermediate characteristic image is the same as the position of the edge in the first image to be processed, and obtaining the horizontal parallax displacement characteristic image.

In this possible implementation manner, by performing filtering processing on the second feature image, so that the position of the edge in the horizontal parallax displacement feature image is the same as the position of the edge in the first to-be-processed image in the horizontal parallax displacement feature image and/or the position of the edge in the vertical parallax displacement feature image is the same as the position of the edge in the first to-be-processed image in the vertical parallax displacement feature image, the accuracy of distinguishing the object from the first to-be-processed image is improved, and then the accuracy of the information included in the horizontal parallax convolution kernel obtained according to the horizontal parallax displacement feature image and/or the accuracy of the information included in the vertical parallax convolution kernel obtained according to the vertical parallax displacement feature image is improved.

In another possible implementation manner, the obtaining a first parallax image between the first image to be processed and the repaired second image to be processed according to the first image to be processed and the repaired second image to be processed includes:

respectively performing feature extraction processing on the first image to be processed and the repaired second image to be processed to obtain a first feature image of the first image to be processed and a second feature image of the repaired second image to be processed;

determining a second horizontal parallax displacement between the first pixel point and the same-name point of the first pixel point in the repaired second image to be processed according to the correlation between the first characteristic image and the second characteristic image;

and obtaining the first parallax image according to the second horizontal parallax displacement.

In this possible implementation manner, by determining the correlation between the first feature image of the first to-be-processed image and the second feature image of the repaired second to-be-processed image, the second horizontal parallax displacement may be obtained, and further, the first parallax image may be obtained according to the second horizontal parallax displacement.

In yet another possible implementation manner, before the obtaining the first parallax image according to the correlation between the first feature image and the second feature image, the method further includes:

taking the pixel point neighborhood in the first characteristic image as a convolution kernel to perform convolution processing on the pixel point neighborhood in the second characteristic image, and determining the correlation between the first characteristic image and the second characteristic image; or the like, or, alternatively,

and taking the pixel point neighborhood in the second characteristic image as a convolution kernel to perform convolution processing on the pixel point neighborhood in the first characteristic image, so as to obtain the correlation between the first characteristic image and the second characteristic image.

In this possible implementation manner, the correlation between the first feature image and the second feature image may be determined by convolving the pixel point neighborhood in the first feature image with the pixel point neighborhood in the second feature image.

In yet another possible implementation manner, the method further includes:

encoding the first parallax image and the first characteristic image to obtain a third characteristic image;

and decoding the third characteristic image to obtain a second parallax image between the first to-be-processed image and the repaired second to-be-processed image, wherein the resolution of the second parallax image is greater than that of the first parallax image.

In this possible implementation manner, the third feature image is obtained by encoding the first parallax image and the first feature image, and the second parallax image is obtained by decoding the third feature image to increase the resolution of the first parallax image.

In another possible implementation manner, the encoding the first parallax image and the first feature image to obtain a third feature image includes:

splicing the first characteristic image and the first image to be processed to obtain a fourth image to be processed;

and coding the fourth image to be processed to obtain the third characteristic image.

In this possible implementation manner, the encoding processing of the first parallax image and the first feature image is implemented by performing encoding processing on the fourth image to be processed, and a third feature image is obtained.

In another possible implementation manner, before the stitching processing is performed on the first parallax image and the first feature image to obtain a fourth image to be processed, the method further includes:

performing feature extraction processing on the first feature image to obtain a fourth feature image of the first feature image;

the splicing processing is performed on the first parallax image and the first characteristic image to obtain a fourth image to be processed, and the method includes:

and splicing the fourth characteristic image and the first parallax image to obtain a fourth image to be processed.

In this possible implementation manner, the feature extraction processing is performed on the first feature image to extract the feature of the first feature image, and the size of the first feature image is reduced to obtain the fourth feature image. And then, the fourth characteristic image and the first parallax image are spliced to obtain a fourth image to be processed, so that the data processing amount can be reduced and the processing speed can be improved when the fourth image to be processed is processed subsequently.

In a second aspect, there is provided an image processing apparatus, the apparatus comprising:

the binocular image processing device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a binocular image, the binocular image comprises a first image to be processed and a second image to be processed, and the image quality of the first image to be processed is higher than that of the second image to be processed;

the first processing unit is used for obtaining a repaired second image to be processed according to the first image to be processed and the second image to be processed, wherein the image visual angle of the repaired second image to be processed is the same as that of the second image to be processed, and the image quality of the repaired second image to be processed is higher than that of the second image to be processed;

and the second processing unit is used for obtaining a first parallax image between the first image to be processed and the repaired second image to be processed according to the first image to be processed and the repaired second image to be processed.

In one possible implementation manner, the first processing unit is configured to: performing first feature extraction processing on the first image to be processed and the second image to be processed to obtain a horizontal parallax displacement feature image, wherein the horizontal parallax displacement feature image comprises first horizontal parallax displacement between a first pixel point in the first image to be processed and a second pixel point in the second image to be processed, and the first pixel point and the second pixel point are homonymous points;

In a possible implementation manner, the first processing unit is further configured to, after the binocular image is obtained, perform second feature extraction processing on the first to-be-processed image and the second to-be-processed image to obtain a vertical parallax displacement feature image, where the vertical parallax displacement feature image includes vertical parallax displacement between the first pixel point and the second pixel point;

the first processing unit is further configured to perform convolution processing on the first image to be processed by using the horizontal parallax displacement characteristic image and the vertical parallax displacement characteristic image as convolution kernels respectively, so as to obtain the repaired second image to be processed.

In another possible implementation manner, the first processing unit is specifically configured to:

In yet another possible implementation manner, the first processing unit is configured to:

splicing the first image to be processed and the second image to be processed to obtain a third image to be processed;

performing n-level coding processing on the third image to be processed to obtain a first intermediate characteristic image, wherein n is a positive integer;

In another possible implementation manner, the second processing unit is configured to:

In yet another possible implementation manner, the apparatus further includes: a convolution processing unit, configured to perform convolution processing on a pixel neighborhood in the second feature image by using a pixel neighborhood in the first feature image as a convolution kernel before the first parallax image is obtained according to the correlation between the first feature image and the second feature image, and determine the correlation between the first feature image and the second feature image; or the like, or, alternatively,

In yet another possible implementation manner, the apparatus further includes:

the encoding processing unit is used for encoding the first parallax image and the first characteristic image to obtain a third characteristic image;

and the decoding processing unit is used for performing decoding processing on the third characteristic image to obtain a second parallax image between the first to-be-processed image and the repaired second to-be-processed image, wherein the resolution of the second parallax image is greater than that of the first parallax image.

In yet another possible implementation manner, the encoding processing unit is configured to:

In another possible implementation manner, the feature extraction processing unit is further configured to, before the stitching processing is performed on the first parallax image and the first feature image to obtain a fourth image to be processed, perform feature extraction processing on the first feature image to obtain a fourth feature image of the first feature image;

the encoding processing unit is specifically configured to: and splicing the fourth characteristic image and the first parallax image to obtain a fourth image to be processed.

In a third aspect, a processor is provided, which is configured to perform the method according to the first aspect and any one of the possible implementations thereof.

In a fourth aspect, an electronic device is provided, comprising: a processor, transmitting means, input means, output means, and a memory for storing computer program code comprising computer instructions which, when executed by the processor, cause the electronic device to perform the method of the first aspect and any one of its possible implementations.

In a fifth aspect, there is provided a computer readable storage medium having stored therein a computer program comprising program instructions which, when executed by a processor of an electronic device, cause the processor to perform the method of the first aspect and any one of its possible implementations.

A sixth aspect provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the first aspect and any of its possible implementations.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments or the background art of the present application, the drawings required to be used in the embodiments or the background art of the present application will be described below.

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a schematic diagram of a homonymy point in a binocular image according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of an image processing method according to an embodiment of the present application;

fig. 3 is a schematic flowchart of another image processing method according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an encoding layer and a decoding layer according to an embodiment of the present application;

fig. 5a is a schematic diagram of a pixel neighborhood provided in the embodiment of the present application;

FIG. 5b is a schematic diagram of 2 1-dimensional convolution kernels according to an embodiment of the present application;

fig. 5c is a schematic diagram of a pixel neighborhood after the position of a pixel is moved according to an embodiment of the present disclosure;

fig. 6 is a schematic flowchart of another image processing method according to an embodiment of the present application;

fig. 7 is a schematic flowchart of another image processing method according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a first decoding process and a second decoding process provided in an embodiment of the present application;

FIG. 9 is a schematic illustration of a co-located element provided by embodiments of the present disclosure;

fig. 10 is a schematic flowchart of another image processing method according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of an image inpainting subnetwork according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of a parallax image generation sub-network according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of an encoding layer according to an embodiment of the present application;

fig. 14 is a schematic structural diagram of a decoding layer according to an embodiment of the present application;

fig. 15 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 16 is a schematic diagram of a hardware structure of an image processing apparatus according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In the embodiment of the present application, binocular images refer to two images obtained by photographing the same object from different positions at the same time by two different imaging apparatuses (which will be referred to as binocular imaging apparatuses hereinafter). And corresponding to the same physical point, pixel points in different images in the binocular image are the same points. Fig. 1 shows two images in a binocular image, in which a pixel point a and a pixel point C are homonymous points, and a pixel point B and a pixel point D are homonymous points.

And determining the depth information and the three-dimensional position information of any point in the object in the image according to the displacement between the homonymous points in the binocular image. Namely, the depth information and the three-dimensional position information of any point in the object in the image are determined according to the horizontal parallax displacement between the homonymous points (namely, the position deviation of two pixel points which are the same as each other in the horizontal parallax direction), the focal lengths of the two imaging devices and the distance between the optical centers of the two imaging devices. That is, obtaining accurate horizontal parallax displacement between the same-name points is a key to determining depth information and three-dimensional position information of any point in the object in the image. The key point for obtaining the horizontal parallax displacement between the homonymous points is how to determine the homonymous points in the binocular image, and the higher the accuracy of the homonymous points determined in the binocular image is, the more accurate the obtained horizontal parallax displacement between the homonymous points is. The above-described horizontal parallax direction refers to the positive direction of the x-axis of the image coordinate system of the binocular image, and the vertical parallax direction appearing hereinafter refers to the positive direction of the y-axis of the image coordinate system of the binocular image.

In real applications, due to differences between different imaging devices, differences exist between the obtained binocular images, which in turn results in low accuracy of the identified homonymous points in the obtained binocular images, and further results in low accuracy of horizontal parallax displacement between the identified homonymous points (which will be referred to as parallax between homonymous points hereinafter). The differences between the different image forming apparatuses described above include differences between hardware configurations of the image forming apparatuses. For example, since the hardware configuration of imaging device a is higher than that of imaging device B, the resolution of the image acquired by a is higher than that of the image acquired by B. For another example, since the hardware configuration of imaging device a is higher than that of imaging device B, the signal-to-noise ratio of the image acquired by a is higher than that of the image acquired by B. And the difference will result in low accuracy of the parallax image or the depth image obtained based on the binocular image. According to the technical scheme, the difference between the obtained binocular images can be reduced under the condition that the difference exists between the two imaging devices, and the accuracy of the parallax between the same-name points in the binocular images is improved.

The embodiments of the present application will be described below with reference to the drawings.

Referring to fig. 2, fig. 2 is a schematic flowchart illustrating an image processing method according to an embodiment (a) of the present application.

201. And acquiring a binocular image, wherein the binocular image comprises a first image to be processed and a second image to be processed, and the image quality of the first image to be processed is higher than that of the second image to be processed.

The technical scheme provided by the embodiment of the application can be applied to the first terminal, wherein the first terminal comprises a mobile phone, a computer, a tablet computer, a server and the like.

The image quality includes one or more of resolution of the image, signal-to-noise ratio of the image and definition of the image. The resolution of the image is in direct proportion to the image quality, the signal-to-noise ratio of the image is in direct proportion to the image quality, and the definition of the image is in direct proportion to the image quality.

The first to-be-processed image and the second to-be-processed image are images respectively obtained by shooting the same object or scene from different positions at the same time by two different imaging devices. The imaging device may be a camera or a camera. For example, two cameras on one mobile phone. For another example, two cameras are mounted on a smart car. For another example, two cameras on the drone.

As described above, the technical solution provided by the embodiment of the present application can be used to reduce the difference between the image qualities of two images in the binocular images when the image qualities of the two images are not consistent. Therefore, the image quality of the first image to be processed in the embodiment of the present application is higher than that of the second image to be processed.

It is to be understood that the embodiments of the present application illustrate how to reduce the difference between the image qualities of two images in a binocular image, taking two different imaging apparatuses as examples. In practical application, a plurality of images can be obtained by shooting the same object or scene from different positions at the same time through three or more than three imaging devices, the difference between the image qualities of the plurality of images can be reduced through the technical scheme provided by the embodiment of the application, and the number of the imaging devices is not limited by the application.

The binocular image may be acquired by receiving a binocular image input by a user through an input assembly, wherein the input assembly includes: keyboard, mouse, touch screen, touch pad, audio input device, etc. The binocular image sent by the second terminal can also be received, wherein the second terminal comprises a mobile phone, a computer, a tablet computer, a server and the like, and the mode of obtaining the binocular image is not limited in the application.

Optionally, after the first terminal acquires the binocular images, the image quality scores of the two images in the binocular images may be determined according to a preset image quality evaluation index. Wherein the image quality evaluation index includes at least one of: resolution of the image, signal-to-noise ratio of the image, sharpness of the image. After determining the image quality scores of the two images in the binocular image, the first image to be processed and the second image to be processed can be further determined.

202. And obtaining a repaired second image to be processed according to the first image to be processed and the second image to be processed, wherein the image visual angle of the repaired second image to be processed is the same as the image visual angle of the second image to be processed, and the image quality of the repaired second image to be processed is higher than that of the second image to be processed.

In this embodiment, the image angle of view includes a shooting angle of view at which the imaging device shoots the object.

In this embodiment, obtaining the repaired second image to be processed according to the first image to be processed and the second image to be processed may be implemented as follows: and performing feature extraction processing on the first image to be processed and the second image to be processed to obtain a feature image of the first image to be processed and a feature image of the second image to be processed. According to the feature image of the first image to be processed and the feature image of the second image to be processed, horizontal parallax displacement between the homonymous points in the first image to be processed and the homonymous points in the second image to be processed is determined, and then the positions of the pixel points in the first image to be processed can be adjusted according to the horizontal parallax displacement, so that an image (hereinafter, referred to as a repaired second image to be processed) with the same image visual angle as that of the second image to be processed is obtained.

The image quality of the repaired second image to be processed obtained in the above manner is the same as the image quality of the first image to be processed, and the image perspective of the repaired second image to be processed is the same as the image perspective of the second image to be processed.

In another possible implementation manner of obtaining the repaired second to-be-processed image according to the first to-be-processed image and the second to-be-processed image, the image quality of the second to-be-processed image is improved to be the same as that of the first to-be-processed image by performing deblurring processing and/or denoising processing and/or image resolution improvement processing on the second to-be-processed image, so that the repaired second to-be-processed image is obtained.

203. And obtaining a first parallax image between the first to-be-processed image and the repaired second to-be-processed image according to the first to-be-processed image and the repaired second to-be-processed image.

Since the image perspective of the repaired second image to be processed is the same as the image perspective of the second image to be processed, the first image to be processed and the repaired second image to be processed can be regarded as a set of binocular images. Therefore, a first parallax image between the first to-be-processed image and the repaired second to-be-processed image can be obtained according to the first to-be-processed image and the repaired second to-be-processed image. The first parallax image comprises horizontal parallax displacement between homologous points in the first to-be-processed image and the repaired second to-be-processed image.

In an implementation manner of obtaining a first parallax image between a first to-be-processed image and a repaired second to-be-processed image according to the first to-be-processed image and the repaired second to-be-processed image, a feature image of the first to-be-processed image and a feature image of the repaired second to-be-processed image can be obtained by performing feature extraction processing on the first to-be-processed image and the repaired second to-be-processed image. And determining the homonymous points in the characteristic image of the first image to be processed and the characteristic image of the repaired second image to be processed by performing characteristic matching processing on the characteristic image of the first image to be processed and the characteristic image of the repaired second image to be processed. And obtaining the first parallax image according to the horizontal parallax displacement between the homonymous points in the characteristic image of the first image to be processed and the characteristic image of the repaired second image to be processed.

In another implementation manner of obtaining a first parallax image between a first to-be-processed image and a repaired second to-be-processed image according to the first to-be-processed image and the repaired second to-be-processed image, a homonymous point in the first to-be-processed image and a homonymous point in the repaired second to-be-processed image can be determined by performing feature matching processing on the first to-be-processed image and the repaired second to-be-processed image. And obtaining the first parallax image according to the horizontal parallax displacement between the homonymous points in the first image to be processed and the repaired second image to be processed.

The feature matching processing may be implemented by any one of a storm algorithm (brute force), a k-nearest neighbor algorithm (KNN), and a fast nearest neighbor search algorithm (FLANN), which is not limited in the present application.

The embodiment obtains the repaired second image to be processed according to the first image to be processed and the second image to be processed, so that the difference between the image quality of the first image to be processed and the image quality of the repaired second image to be processed is smaller than the image quality of the first image to be processed and the image quality of the second image to be processed. And the precision of the first parallax image obtained according to the first image to be processed and the repaired second image to be processed is higher than that of the parallax image obtained according to the first image to be processed and the second image to be processed.

Referring to fig. 3, fig. 3 is a flowchart illustrating a possible implementation manner of step 202 provided in the second embodiment of the present application.

301. And performing first feature extraction processing on the first image to be processed and the second image to be processed to obtain a horizontal parallax displacement feature image, wherein the horizontal parallax displacement feature image comprises first horizontal parallax displacement between a first pixel point in the first image to be processed and a second pixel point in the second image to be processed, and the first pixel point and the second pixel point are homonymous points.

In the embodiment of the present application, the first feature extraction process may be an encoding process, or may be a combination of an encoding process and a decoding process. The encoding process may be a convolution process or a pooling process, and the decoding process may be a bilinear difference process, a neighboring point interpolation process or a deconvolution process.

In a possible implementation manner, the first image to be processed and the second image to be processed are sequentially encoded step by step through at least two encoding layers, and the feature image obtained after the encoding processing is decoded step by step through at least two decoding layers to obtain the horizontal parallax displacement feature image.

After the processing of the encoding layer, the sizes of the first to-be-processed image and the second to-be-processed image become smaller, and after the processing of the decoding layer, the size of the feature image becomes larger. For example, as shown in fig. 4, in the above possible implementation manner, the number of the encoding layers and the number of the decoding layers may be set to be the same, and the size of the feature image output by the first layer encoding layer is the same as the size of the feature image output by the third layer decoding layer, the size of the feature image output by the second layer encoding layer is the same as the size of the feature image output by the second layer decoding layer, and the size of the feature image output by the third layer encoding layer is the same as the size of the feature image output by the first layer decoding layer.

Since some relatively minor feature information is discarded when the first image to be processed and the second image to be processed are encoded, but the minor feature information is retained in the data before the encoding process, the texture information and the edge information in the feature image can be enriched by fusing the feature image output by the encoding layer and the feature image output by the decoding layer during the decoding process. Alternatively, as shown in fig. 4, the feature image output by the encoding layer may be fused with the feature image of the same size output by the decoding layer. The relatively minor feature information refers to image global feature information, which is used to describe surface properties of objects in the image and cannot fully reflect the attributes of the objects in the image, such as: color feature information, texture feature information, edge feature information. Illustratively, the above-described fusion may be addition.

It should be understood that the number of coding layers and the number of decoding layers in fig. 4 are only an example provided by the present embodiment, and should not limit the present application.

Optionally, before performing the first feature extraction processing on the first image to be processed and the second image to be processed, the first image to be processed and the second image to be processed may be subjected to stitching processing, so as to obtain a stitched image to be processed (i.e., a third image to be processed). The first feature extraction processing on the first image to be processed and the second image to be processed may be realized by performing the first feature extraction processing on the third image to be processed.

And the step of carrying out first feature extraction processing on the third image to be processed comprises the step of carrying out first feature extraction processing on each pixel point in the third image to be processed. By performing the first feature extraction processing on each pixel point in the third image to be processed, feature information of each pixel point in the third image to be processed can be extracted, and the horizontal parallax displacement of each pixel point is determined according to the feature information of each pixel point, so that a horizontal parallax displacement feature image containing the horizontal parallax displacement information of each pixel point is obtained. The horizontal parallax displacement of each pixel point comprises horizontal parallax displacement between the homonymous points in the first image to be processed and the second image to be processed.

For example, a first pixel point in the first to-be-processed image and a second pixel point in the second to-be-processed image are homonymous points, and by performing the first feature extraction processing on the first to-be-processed image and the second to-be-processed image, the first horizontal parallax displacement between the first pixel point and the second pixel point can be determined.

302. And performing convolution processing on the first to-be-processed image by taking the horizontal parallax displacement characteristic image as a convolution kernel to obtain the repaired second to-be-processed image.

And taking the horizontal parallax displacement characteristic image as a convolution core to perform convolution processing on the first image to be processed, and moving the pixel points in the first image to be processed by using horizontal parallax displacement information contained in the horizontal parallax displacement characteristic image to enable the horizontal positions of the moved pixel points to be the same as the horizontal positions of the homonymous points in the second image to be processed. Because the image quality of the first image to be processed is higher than that of the second image to be processed, the horizontal positions of the pixels in the first image to be processed are made to be the same as those of the pixels in the second image to be processed by moving the pixels in the first image to be processed, which is equivalent to obtaining the second image to be processed after the image quality is improved, namely the repaired second image to be processed.

The horizontal parallax displacement characteristic image obtained in step 301 includes horizontal parallax displacement information of all the corresponding points in the first image to be processed and the second image to be processed, so that when the horizontal parallax displacement characteristic image is used as a convolution kernel to perform convolution processing on the first image to be processed, a convolution kernel of the corresponding point in the first image to be processed can be determined according to the horizontal parallax displacement information of each pixel point in the horizontal parallax displacement characteristic image, and the convolution kernel is used to perform convolution processing on the corresponding point in the first image to be processed. And after the convolution processing of all the pixel points in the first image to be processed is finished, the repaired second image to be processed can be obtained.

Optionally, the difficulty of accurately determining the homonymous point in the first to-be-processed image and the second to-be-processed image is high, and there is correlation between a plurality of pixel points in the same image, for example, the pixel point a in the first to-be-processed image is white, and the probability that a plurality of pixel points around the pixel point a are white is also high, that is, the probability that a pixel point in a pixel point neighborhood with the pixel point a as the center is white is high. Therefore, after the convolution kernel of each pixel point in the first image to be processed is obtained, the convolution kernel can be used for performing convolution processing on the neighborhood of the pixel point in the first image to be processed, so that the effect of adjusting the horizontal position of the pixel point in the first image to be processed is improved. For example, assume that after determining that the convolution kernel of the pixel a in the first image to be processed is a, the convolution kernel a is used to perform convolution processing on the pixel neighborhood b constructed by taking the pixel a as the center. Therefore, when the convolution processing is carried out on the pixel point neighborhood b, the horizontal position of the pixel point A can be adjusted by utilizing the correlation between the pixel point A and the pixel points except the pixel point A in the pixel point neighborhood b and the horizontal position information of the pixel points except the pixel point A in the pixel point neighborhood b, and the effect of adjusting the horizontal position of the pixel point A is further improved. The size of the pixel neighborhood in this embodiment can be adjusted according to the actual use effect, which is not limited in this application.

In a possible implementation manner, a convolution kernel of a pixel point in the first image to be processed may be obtained by performing weighted summation on horizontal parallax displacement between the pixel point in the second image to be processed and each pixel point in the neighborhood of the pixel point in the first image to be processed. For example, a pixel point a in the first image to be processed and a pixel point b in the second image to be processed are homonymous points, and a pixel point neighborhood constructed by taking the pixel point a as a center in the first image to be processed includes a pixel point c and a pixel point d. The horizontal parallax displacement of the pixel point a and the pixel point b is d₁The horizontal parallax displacement of pixel b and pixel c is d₂The horizontal parallax displacement of pixel b and pixel d is d₃. The weight of the pixel point b is 0.4, the weight of the pixel point c is 0.3, the weight of the pixel point d is 0.3, and then the horizontal parallax displacement of the pixel point a contained in the horizontal parallax displacement characteristic image obtained by performing the characteristic extraction processing on the first characteristic image and the second characteristic image is as follows: 0.4d₁+0.3d₂+0.3d₃. And determining a convolution kernel of the pixel point a according to the horizontal parallax displacement of the pixel point a in the horizontal parallax displacement characteristic image, and performing convolution processing on the pixel point a by using the convolution kernel to move the pixel point a.

Optionally, the process of determining the convolution kernel of the pixel point in the first image to be processed according to the horizontal parallax displacement feature image may be implemented by a neural network. The pixel neighborhood corresponding to the pixel point in the second image to be processed in the first image to be processed (the pixel neighborhood corresponding to the pixel point b in the above example includes the pixel point a, the pixel point c and the pixel point d) and the weights corresponding to different pixel points in the pixel neighborhood can be determined by the neural network. The ability of the neural network to determine the pixel point neighborhood corresponding to the pixel point in the second image to be processed from the first image to be processed and the weight of different pixel points in the pixel point neighborhood can be obtained through training of the neural network, for example, in the process of training the neural network, the specific training mode of the neural network is not limited by the application, and the label information containing the pixel point neighborhood is used as the supervision information to supervise the neural network.

For example, the pixel point a in the first image to be processed and the pixel point B in the second image to be processed are the same-name points, the pixel point C in the first image to be processed and the pixel point D in the second image to be processed are the same-name points, and the horizontal parallax displacement between the pixel point a and the pixel point B is D₁The horizontal parallax displacement of the pixel point C and the pixel point D is D₂Wherein D is₁And D₂Not equal. The technical scheme provided by the embodiment of the application can respectively determine a convolution kernel for each pixel point in the first image to be processed according to the horizontal parallax displacement information in the horizontal parallax displacement characteristic image, and perform convolution processing on the pixel point in the first image to be processed through the determined convolution kernel so as to adjust the horizontal position of the pixel point in the first image to be processed. In this way, by determining different convolution kernels for different pixel points, the difference between the horizontal positions of the pixel points in the first image to be processed and the horizontal positions of the same-name points in the second image to be processed can be made smaller by adjusting the horizontal positions of the pixel points in the first image to be processed, and the difference between the obtained repaired second image to be processed and the first image to be processed can be made smaller.

The present embodiment obtains a horizontal parallax displacement feature image including horizontal parallax displacement between homologous points in the first image to be processed and the second image to be processed by performing feature extraction processing on the first image to be processed and the second image to be processed. And determining a convolution kernel for each pixel point in the first image to be processed according to the horizontal parallax displacement information in the horizontal parallax displacement characteristic image, and performing convolution processing on the pixel point in the first image to be processed by using the convolution kernel to adjust the horizontal position of the pixel point in the first image to be processed, so that the difference between the repaired second image to be processed obtained after adjustment and the first image to be processed can be reduced.

Based on the technical solutions provided in the first and second embodiments, the horizontal positions of the pixel points in the first image to be processed can be adjusted, but in practical application, if any one of the binocular imaging devices has a calibration error, a vertical parallax displacement will also exist between the same-name points in the binocular image acquired by the binocular imaging devices. For example, due to a calibration error of the imaging device a in the vertical direction, the imaging device a and the imaging device B are not in the same horizontal plane, so that a vertical parallax displacement exists between the homonymous point in the first to-be-processed image acquired by the imaging device a and the second to-be-processed image acquired by the imaging device B.

Obviously, when there is vertical parallax displacement between the homonymous points in the first image to be processed and the second image to be processed, in the process of obtaining the repaired second image to be processed by moving the pixel points in the first image to be processed, not only the horizontal positions of the pixel points in the first image to be processed but also the vertical positions of the pixel points in the first image to be processed need to be adjusted. Thus, the difference between the repaired second image to be processed and the first image to be processed, which are obtained after the pixel points in the first image to be processed are moved, can be reduced.

Based on the idea of adjusting the horizontal positions of the pixels in the first image to be processed provided in the embodiment (two), a 2-dimensional convolution kernel can be determined for each pixel in the first image to be processed according to the horizontal parallax displacement and the vertical parallax displacement between the corresponding points in the first image to be processed and the second image to be processed, where the 2-dimensional convolution kernel includes horizontal parallax displacement information and vertical parallax displacement information between the corresponding points in the first image to be processed and the second image to be processed. The 2-dimensional convolution is used for checking the corresponding pixel points in the first image to be processed to carry out convolution processing, and the horizontal position and the vertical position of the pixel points in the first image to be processed can be adjusted simultaneously.

Because the number of the pixel points included in the first image to be processed is large, if a 2-dimensional convolution is used to check one pixel point in the first image to be processed and perform convolution processing so as to adjust the horizontal positions and the vertical positions of all the pixel points in the first image to be processed in a manner of simultaneously adjusting the horizontal positions and the vertical positions of the pixel points, huge data processing amount is brought. In order to reduce the data processing amount required for adjusting the horizontal position and the vertical position of the pixel point in the first image to be processed, in the embodiment of the present application, two 1-dimensional convolution kernels are determined for each pixel point in the first image to be processed, one of the two 1-dimensional convolution kernels (which will be referred to as a horizontal parallax convolution kernel hereinafter) is used for performing convolution processing on the pixel point in the first image to be processed so as to adjust the horizontal position of the pixel point in the first image to be processed, and the other 1-dimensional convolution kernel (which will be referred to as a vertical parallax convolution kernel hereinafter) is used for performing convolution processing on the pixel point in the first image to be processed so as to adjust the vertical position of the pixel point in the first image to be processed. The two 1-dimensional convolution kernels are used for respectively carrying out convolution processing on the pixel points in the first image to be processed, so that the horizontal positions and the vertical positions of the pixel points in the first image to be processed can be adjusted.

For example, fig. 5a shows a pixel neighborhood constructed by pixel a in the first image to be processed, the size of the pixel neighborhood is H × W, if a size L is used_h×L_vThe 2-dimensional convolution kernel of FIG. 5a can move the pixel A of FIG. 5a to the position shown in FIG. 5c by performing convolution processing on the pixel neighborhood, where the number of parameters to be processed is H × W × L_h×L_v. If used, the sizes are respectively L_hX 1 and 1X L_vThe two 1-dimensional convolution kernels respectively perform convolution processing on the pixel neighborhood shown in fig. 5a, which is equivalent to performing convolution processing on the pixel neighborhood shown in fig. 5a by using the two 1-dimensional convolution kernels shown in fig. 5bThe product process can also move the pixel A in FIG. 5a to the position shown in FIG. 5c, but the number of parameters to be processed is H × W × (L)_h+L_v). Obviously, the number of parameters to be processed by using two 1-dimensional convolution kernels to perform convolution processing on the pixel neighborhood shown in fig. 5a is less than the number of parameters to be processed by using one 2-dimensional convolution kernel to perform convolution processing on the pixel neighborhood shown in fig. 5 a.

How to obtain the horizontal parallax convolution kernel and the vertical parallax convolution kernel based on the first image to be processed and the second image to be processed, and how to perform convolution processing on pixel points in the first image to be processed by using the horizontal parallax convolution kernel and the vertical parallax convolution kernel to obtain the repaired second image to be processed will be described in detail below.

Referring to fig. 6, fig. 6 is a schematic flowchart of an image processing method according to the third embodiment of the present application.

601. And acquiring a binocular image, wherein the binocular image comprises a first image to be processed and a second image to be processed, and the image quality of the first image to be processed is higher than that of the second image to be processed.

The implementation process of this step can refer to step 201, and will not be described herein again.

602. The method comprises the steps of carrying out first feature extraction processing on a first image to be processed and a second image to be processed to obtain a horizontal parallax displacement feature image, carrying out second feature extraction processing on the first image to be processed and the second image to be processed to obtain a vertical parallax displacement feature image, wherein the horizontal parallax displacement feature image comprises horizontal parallax displacement between a first pixel point in the first image to be processed and a second pixel point in the second image to be processed, the vertical parallax displacement feature image comprises vertical parallax displacement between the first pixel point and the second pixel point, and the first pixel point and the second pixel point are homonymous points.

The first feature extraction process in this step can be referred to as step 301. The second feature extraction process may be an encoding process, or may be a combination of an encoding process and a decoding process, as in the first feature extraction process. The encoding process may be a convolution process or a pooling process, and the decoding process may be a bilinear difference process, a neighboring point interpolation process or a deconvolution process.

It is to be understood that, although the first feature extraction process and the second feature extraction process may involve the same process, the first feature extraction process and the second feature extraction process may extract feature images containing different information from the first image to be processed and the second image to be processed. For example, the convolutional neural network a and the convolutional neural network B are convolutional neural networks with the same structure and different parameters, the convolutional neural network a is used for performing feature extraction processing on the first image to be processed and the second image to be processed, a horizontal parallax displacement feature image containing horizontal parallax displacement between corresponding points in the first image to be processed and the second image to be processed can be obtained, the convolutional neural network B is used for performing feature extraction processing on the first image to be processed and the second image to be processed, and a vertical parallax displacement feature image containing vertical parallax displacement between corresponding points in the first image to be processed and the second image to be processed can be obtained.

603. And performing convolution processing on the first image to be processed by respectively taking the horizontal parallax displacement characteristic image and the vertical parallax displacement characteristic image as convolution kernels to obtain the repaired second image to be processed.

After the horizontal parallax displacement characteristic image and the vertical parallax displacement characteristic image are obtained, the horizontal parallax displacement characteristic image and the vertical parallax displacement characteristic image can be used as convolution cores to perform convolution processing on the first image to be processed respectively, and a repaired second image to be processed is obtained.

In one possible implementation manner, the horizontal parallax displacement feature image is used as a convolution kernel to perform convolution processing on the first image to be processed, and a fifth image to be processed can be obtained. And performing convolution processing on the fifth to-be-processed image by taking the vertical parallax displacement characteristic image as a convolution kernel, so as to obtain a repaired second to-be-processed image. In another possible implementation manner, the vertical parallax displacement feature image is used as a convolution kernel to perform convolution processing on the first image to be processed, so that a sixth image to be processed can be obtained. And performing convolution processing on the sixth image to be processed by taking the horizontal parallax displacement characteristic image as a convolution kernel, so as to obtain a repaired second image to be processed.

Optionally, as described above, when the horizontal parallax displacement feature image is used to perform convolution processing on the first image to be processed, a convolution kernel may be determined for the pixel neighborhood of each pixel in the first image to be processed according to the horizontal parallax displacement information in the horizontal parallax displacement feature image, and the convolution kernel is used to perform convolution processing on the corresponding pixel neighborhood, so as to improve the processing effect. In this step, a horizontal parallax convolution kernel and a vertical parallax convolution kernel are respectively determined for each pixel neighborhood in the first image to be processed according to the horizontal parallax displacement characteristic image and the vertical parallax displacement characteristic image, and then the corresponding pixel neighborhood is convoluted by using the horizontal parallax convolution kernel and the vertical parallax convolution kernel respectively, so as to obtain a repaired second image to be processed. At this time, if the size of the horizontal disparity convolution kernel is L_hX 1, size of vertical parallax convolution kernel 1 × L_vThe size of the neighborhood of the pixel point is L_h×L_v。

In subsequent processing, the binocular image is used to obtain a disparity map of an object (the object may be an object or a person) in the binocular image and depth information of the object in the binocular image, so the technical scheme provided by the application focuses more on adjusting the position of the object in the first image to be processed, which can reduce data processing amount and improve processing speed. Since the first to-be-processed image includes an object and content (hereinafter, referred to as a background) other than the object, and the background of the object has a complex texture, it will cause great difficulty in distinguishing the object from the background. If the effect of distinguishing the object from the background is not good (for example, the distinguished object contains a large amount of background content, and if part of the content of the object is judged as the background, the distinguished object is incomplete), the accuracy of information contained in the horizontal parallax convolution kernel obtained according to the horizontal parallax displacement feature image is low, and/or the accuracy of information contained in the vertical parallax convolution kernel obtained according to the vertical parallax displacement feature image is low, so that the adjustment effect on the position of the pixel point in the first image to be processed is poor, that is, the difference between the repaired second image to be processed and the first image to be processed is large.

Optionally, in the embodiment of the application, by performing filtering processing on the horizontal parallax displacement characteristic image and/or the vertical parallax displacement characteristic image, so that the position of the edge in the horizontal parallax displacement characteristic image is the same as the position of the edge in the first image to be processed in the horizontal parallax displacement characteristic image and/or the position of the edge in the vertical parallax displacement characteristic image is the same as the position of the edge in the first image to be processed in the vertical parallax displacement characteristic image, the accuracy of distinguishing the object from the first image to be processed is improved, and then the accuracy of the information contained in the horizontal parallax convolution kernel obtained according to the horizontal parallax displacement characteristic image and/or the accuracy of the information contained in the vertical parallax convolution kernel obtained according to the vertical parallax displacement characteristic image are/is improved.

Optionally, the filtering process may be a guiding filtering process, that is, the first to-be-processed image is used as a guiding image to perform filtering process on the horizontal parallax displacement feature image and/or the vertical parallax displacement feature image, so that the position of the edge in the horizontal parallax displacement feature image is the same as the position of the edge in the first to-be-processed image in the horizontal parallax displacement feature image and/or the position of the edge in the vertical parallax displacement feature image is the same as the position of the edge in the first to-be-processed image in the vertical parallax displacement feature image.

In the embodiments of the present application, the image is a digital image, and the edge in the digital image includes a pixel region with a large gradient change in the digital image, that is, the edge in the image includes a boundary between an object in the digital image and a background, that is, an outline of the object. After the contour of the object in the first image to be processed is determined, the area covered by the object can be accurately extracted from the first image to be processed. Because the region covered by the object is the connected region, when the positions of the pixel points in the region covered by the object in the first image to be processed are adjusted in the subsequent processing, the positions of the pixel points in the region covered by the object can be adjusted by adjusting the positions of the pixel points of the outline of the object, so that the data processing amount is reduced, and the processing speed is improved. Therefore, in the process of the guided filtering processing, the gradient of the region of the non-object contour can be reduced, the region of the non-object contour is smooth, and the effect of sharpening the object contour is achieved, so that the subsequent adjustment of the position of the pixel point in the region covered by the object in the first image to be processed is facilitated.

Since there is also a gradient in the region of the non-object contour, but the gradient of the non-object contour region is generally smaller than that of the object contour region, optionally, the object contour region and the non-object contour region in the horizontal parallax displacement feature image and/or the vertical parallax displacement feature image may be determined by a gradient threshold. Specifically, a region with a gradient greater than or equal to a gradient threshold value in the horizontal parallax displacement characteristic image is determined as an object contour region and/or a region with a gradient greater than or equal to a gradient threshold value in the vertical parallax displacement characteristic image is determined as an object contour region.

In the embodiment, the vertical parallax displacement feature image containing the vertical parallax displacement between the homonymous points in the first image to be processed and the second image to be processed is obtained by performing the second feature extraction processing on the first image to be processed and the second image to be processed. And performing convolution processing on the pixel points in the first image to be processed by using the vertical parallax displacement image, and adjusting the vertical positions of the pixel points in the first image to be processed so as to reduce the difference between the repaired second image to be processed obtained by the convolution processing and the first image to be processed.

As shown in step 602, the process involved in the first feature extraction process and the process involved in the second feature extraction process may be the same, such as by implementing the first feature extraction process and the second feature extraction process separately by two convolutional neural networks with the same structure but different parameters. The first feature extraction process and the second feature extraction process may also include different processes, such as implementing the first feature extraction process and the second feature extraction process separately by two convolutional neural networks having different structures. In order to reduce the data processing amount of extracting the horizontal parallax displacement feature image and the vertical parallax displacement feature image from the first image to be processed and the second image to be processed through the first feature extraction processing and the second feature extraction processing, in the embodiment of the application, the first intermediate feature image is extracted from the first image to be processed and the second image to be processed in a mode of at least two layers of encoding processing and at least two layers of decoding processing, and then the feature extraction processing is performed on the first intermediate feature image through two different decoding branches to respectively obtain the horizontal parallax displacement feature image and the vertical parallax displacement feature image. Here, the "decoding branch" includes convolution processing and decoding processing.

Referring to fig. 7, fig. 7 is a flowchart of a method for obtaining a horizontal parallax displacement feature image by performing a first feature extraction process on a first to-be-processed image and a second to-be-processed image and obtaining a vertical parallax displacement feature image by performing a first feature extraction process on the first to-be-processed image and the second to-be-processed image according to (fourth) embodiment of the present application.

701. And splicing the first image to be processed and the second image to be processed to obtain a third image to be processed.

The splicing process in the embodiment of the present application may be merging (splice) in the channel dimension. For example, if the number of channels of the first to-be-processed image is 3 and the number of channels of the second to-be-processed image is 2, the number of channels of a third to-be-processed image obtained by stitching the first to-be-processed image and the second to-be-processed image is 5.

702. And performing n-level coding processing on the third image to be processed to obtain a first intermediate characteristic image, wherein n is a positive integer.

In this embodiment, the feature extraction processing on the third image to be processed is implemented by performing step-by-step coding processing on the third image to be processed through n coding layers. Each layer of coding layer carries out primary coding processing, and n layers of coding layers are sequentially connected in series, namely output data of a layer 1 coding layer is input data of a layer 2 coding layer, output data of the layer 2 coding layer is input data of a layer 3 coding layer, …, output data of an n-1 coding layer is input data of an n coding layer, and output data of the n coding layer is intermediate characteristic data. Wherein n is a positive integer. Illustratively, n is 4.

The coding layer may be implemented in various ways, such as convolution processing, pooling processing, and the like, which is not limited in this embodiment.

In one possible implementation, the coding layer includes a pooling layer and at least two convolutional layers connected in series, wherein the data output by the pooling layer is input data of the at least two convolutional layers connected in series. Taking the first layer coding layer as an example, the third image to be processed is input data of a pooling layer, and pooling processing is performed on the third image to be processed through the pooling layer, so that the resolution of the third image to be processed can be reduced and sampling points can be reduced, the size of a feature image extracted from a fifth generation processed image can be smaller, and the calculation amount of subsequent processing can be further reduced. The pooling treatment may be an average pooling or a maximum pooling. The characteristic image output by the pooling process is input data of at least two layers of convolution layers, and characteristic information and semantic information extracted by each layer of the at least two layers of convolution layers are different. The method specifically includes abstracting feature information in the feature image output by the pooling step by step through convolution processing of at least two layers of convolution layers, and removing relatively secondary feature information step by step, so that the feature information and semantic information in the feature image are concentrated when the size of the feature image extracted later is smaller. The characteristic image output by the pooling process is subjected to the convolution processing step by step through at least two layers of convolution layers, so that the size of the characteristic image output by the pooling process can be reduced while the characteristic information in the characteristic image output by the pooling process is obtained, the calculated amount of the system is reduced, and the operation speed of the system is improved. Illustratively, the convolution kernels in each of the at least two convolutional layers have a size of 3 × 3, the number of convolutional layers in the coding layer is 3, and the step size of the convolution process is 2.

703. And performing m-level first decoding processing on the intermediate characteristic image to obtain a second intermediate characteristic image, and performing m-level second decoding processing on the intermediate characteristic image to obtain a third intermediate characteristic image.

In this embodiment, both the first decoding process and the second decoding process may be implemented by m decoding layers, where m is a positive integer. When m is a positive integer greater than or equal to 2, the first m-1 decoding layer of the m decoding layers of the first decoding process is the same as the first m-1 decoding layer of the m decoding layers of the second decoding process. As shown in fig. 8, the first decoding process includes a first layer decoding layer, a second layer decoding layer, a third layer decoding layer, a fourth layer decoding layer, and a fifth layer decoding layer, and the second decoding process includes a first layer decoding layer, a second layer decoding layer, a third layer decoding layer, a fourth layer decoding layer, and a sixth layer decoding layer.

The decoding process in this step is the reverse process of the encoding process in step 702, and the feature image after decoding process is obtained by performing a step-by-step decoding process on the first intermediate feature image through the m-1 decoding layer and amplifying the size of the first intermediate feature image step by step. Wherein, each layer of decoding layer carries out one-level decoding processing, and m-1 layers of decoding layers are connected in series in sequence, and the input data of the next layer of decoding processing can be determined based on the output data of the previous layer of decoding processing. I.e. the input data of the layer 2 decoding layer may be determined based on the output data of the layer 1 decoding layer, the input data of the layer 3 decoding layer may be determined based on the output data of the layer 2 decoding layer, …, the input data of the m-1 decoding layer may be determined based on the output data of the m-2 decoding layer. Illustratively, m-1 is equal to n when m is a positive integer greater than or equal to 2.

The decoding layer may include a convolutional layer and an upsampling layer, wherein the upsampling layer may include any one of the following processes: bilinear interpolation processing, nearest neighbor interpolation processing, high-order interpolation and deconvolution processing, and the specific implementation mode of the upsampling layer is not limited in the application.

As shown in step 702, while the feature information is gradually extracted from the third image to be processed by the encoding layer to perform encoding processing on the third image to be processed, some relatively minor feature information in the third image to be processed is discarded, where the relatively minor feature information includes texture feature information, edge feature information, and the like. Therefore, the feature image output by the coding layer and the feature image output by the decoding layer can be fused to enrich texture information and edge information in the semantic feature image.

In one possible implementation, m is a positive integer greater than or equal to 2, and m-1 is equal to n. By fusing the characteristic image output by the ith-level coding processing in the n-level coding processing with the characteristic image output by the jth-level first decoding processing in the m-level first decoding processing, the input data of the jth + 1-level first decoding processing in the m-level first decoding processing is obtained, and the texture information and the edge information in the characteristic image output by each layer of decoding layer can be enriched. Optionally, the size of the feature image output by the i-th level encoding processing is the same as the size of the feature image output by the j-th level decoding processing.

For example, the above fusion may be an addition, that is, an addition of elements at the same position in two feature images to be fused. In the embodiment of the present application, the same-position elements in the two images can be seen in fig. 9, as shown in fig. 9, the position of element a in image a is the same as the position of element j in image B, the position of element B in image a is the same as the position of element k in image B, the position of element c in image a is the same as the position of element l in image B, the position of element d in image a is the same as the position of element m in image B, the position of element e in image a is the same as the position of element n in image B, the position of element f in image a is the same as the position of element o in image B, the position of element g in image a is the same as the position of element p in image B, the position of element h in image a is the same as the position of element q in image B, and the position of element i in image a is the same as the position of element r in image B.

When m is a positive integer greater than or equal to 2, the m-th layer decoding layer in the first feature extraction processing is different from the m-th layer decoding layer in the second feature extraction processing. Such as the layer 5 decoding layer and the layer 6 decoding layer shown in fig. 8. Optionally, if the structure of the m-th decoding layer in the first feature extraction process is the same as that of the m-th decoding layer in the second feature extraction process (e.g., both include 3 convolutional layers and 1 upsampling layer, and the stacking manner of the 3 convolutional layers and the 1 upsampling layer is the same), different parameters may be respectively given to the m-th decoding layer in the first feature extraction process and the m-th decoding layer in the second feature extraction process, so as to respectively extract the second intermediate feature image and the third intermediate feature image from the feature image output by the m-1-th decoding layer.

As can be seen from this step, the first feature extraction process and the second feature extraction process can both decode the first intermediate feature image through the layer 1 decoding layer, the layer 2 decoding layer, …, and the layer m-1 decoding layer, which can reduce the data processing amount and increase the processing speed.

704. The second intermediate feature image is subjected to filter processing using the first to-be-processed image as a guide image so that the position of the edge in the second intermediate feature image is the same as the position of the edge in the first to-be-processed image, thereby obtaining the horizontal parallax displacement feature image, and the third intermediate feature image is subjected to filter processing using the first to-be-processed image as a guide image so that the position of the edge in the third intermediate feature image is the same as the position of the edge in the first to-be-processed image, thereby obtaining the vertical parallax displacement feature image.

The implementation process of this step can refer to step 603, and will not be described herein again.

By the technical scheme provided by the embodiment, the first feature extraction processing and the second feature extraction processing are carried out on the first image to be processed and the second image to be processed, so that the data processing amount can be reduced, and the processing speed can be improved.

The technical solutions provided by the embodiments (two) to (four) can obtain the repaired second image to be processed based on the first image to be processed. Further, a parallax image including horizontal parallax displacement information between the homologous points of the first image to be processed and the repaired second image to be processed may be obtained based on the first image to be processed and the repaired second image to be processed. In a possible implementation manner, a second horizontal parallax displacement between the first pixel point and the same-name point of the first pixel point in the repaired second image to be processed is determined, and a first parallax image between the first image to be processed and the repaired second image to be processed can be obtained according to the second horizontal parallax displacement.

How to obtain the first parallax image between the first to-be-processed image and the repaired second to-be-processed image from the first to-be-processed image and the repaired second to-be-processed image will be described in detail below. Referring to fig. 10, fig. 10 is a flowchart of a possible implementation manner of step 203 provided in the embodiment (five) of the present application.

1001. And respectively carrying out feature extraction processing on the first image to be processed and the repaired second image to be processed to obtain a first feature image of the first image to be processed and a second feature image of the repaired second image to be processed.

As with the first feature extraction process and the second feature extraction process described above, the feature extraction process performed on the first image to be processed and the repaired second image to be processed in the present embodiment may be an encoding process, or may be a combination of an encoding process and a decoding process. The encoding process may be a convolution process or a pooling process, and the decoding process may be a bilinear difference process, a neighboring point interpolation process or a deconvolution process. The feature extraction processing performed on the first image to be processed and the feature extraction processing performed on the second image to be processed may be the same or different.

In one possible implementation manner, both the feature extraction processing performed on the first to-be-processed image and the feature extraction processing performed on the repaired second to-be-processed image can be implemented by at least two convolutional layers. The method comprises the steps of inputting a first image to be processed to the at least two layers of convolution layers to realize feature extraction processing of the first image to be processed, and inputting a repaired second image to be processed to the at least two layers of convolution layers to realize feature extraction processing of the repaired second image to be processed. Optionally, the number of convolutional layers in the at least two convolutional layers is 3.

By respectively performing feature extraction processing on the first image to be processed and the repaired second image to be processed, the first feature image can be extracted from the first image to be processed while reducing the sizes of the first image to be processed and the repaired second image to be processed, and the second feature image can be extracted from the repaired second image to be processed. Thus, the data processing amount of the subsequent processing can be reduced, and the processing speed can be improved.

1002. And determining a second horizontal parallax displacement between the first pixel point and the same-name point of the first pixel point in the repaired second image to be processed according to the correlation between the first characteristic image and the second characteristic image.

The correlation includes a degree of matching between the feature in the first feature image and the feature in the second feature image, and the correlation between the first feature image and the second feature image includes a degree of matching between the feature in the first feature image and the feature in the second feature image. According to the similarity between the features in the first feature image and the features in the second feature image, the homonymous points in the first feature image and the second feature image can be determined, and then the first parallax image can be obtained according to the second horizontal parallax displacement between the homonymous points in the first feature image and the second feature image.

In a possible implementation manner, the position of the third pixel point in the first feature image is the same as the position of the first pixel point in the first image to be processed, the first similarity between the feature of the third pixel point and the feature of the pixel point in the second feature image is determined, and the pixel point with the maximum first similarity in the second feature image is selected as the homonymous point (hereinafter, referred to as a fourth pixel point) of the third pixel point. And determining the horizontal parallax displacement between the third pixel point and the fourth pixel point to obtain a first parallax image.

In the possible implementation manner, the horizontal parallax displacement between the third pixel point and the fourth pixel point is a second horizontal parallax displacement between the first pixel point and a same-name point in the repaired second to-be-processed image of the first pixel point.

Optionally, as shown in step 302, there is correlation between a plurality of pixel points in the same image, and therefore the correlation between the first feature image and the second feature image further includes feature similarity between a pixel point neighborhood in the first feature image and a pixel point neighborhood in the second feature image. And determining the first parallax image according to the feature similarity between the pixel point neighborhood in the first feature image and the pixel point neighborhood in the second feature image, and improving the accuracy of the obtained first parallax image by utilizing the correlation between the pixel points contained in the pixel point neighborhood. The size of the pixel neighborhood in this embodiment can be adjusted according to the actual use effect, which is not limited in this application. The size of the pixel neighborhood in the first characteristic image and the size of the pixel neighborhood in the second characteristic image can be the same or different.

In an implementation mode for determining the feature similarity between a pixel neighborhood in a first feature image and a pixel neighborhood in a second feature image, the pixel neighborhood in the first feature image is used as convolution to check the pixel neighborhood in the second feature image for convolution processing, and the first feature similarity between the pixel neighborhood in the first feature image and the pixel neighborhood in the second feature image is obtained. And determining the feature similarity between the pixel point neighborhood in the first feature image and the pixel point neighborhood in the second feature image according to the maximum value of the first feature similarity. In another implementation manner of determining the feature similarity between the pixel neighborhood in the first feature image and the pixel neighborhood in the second feature image, the pixel neighborhood in the second feature image is used as convolution to check the pixel neighborhood in the first feature image for convolution processing, and the second feature similarity between the pixel neighborhood in the first feature image and the pixel neighborhood in the second feature image is obtained. And determining the feature similarity between the pixel point neighborhood in the first feature image and the pixel point neighborhood in the second feature image according to the maximum value of the first feature similarity. Optionally, the size of the pixel neighborhood in the first feature image is equal to the size of the pixel neighborhood in the second feature image.

1003. And obtaining the first parallax image according to the second horizontal parallax displacement.

After the second horizontal parallax displacement is obtained in step 1002, the first parallax image can be obtained according to the second horizontal parallax displacement.

According to the correlation between the first feature image of the first image to be processed and the second feature image of the repaired second image to be processed, the parallax image between the first feature image and the second feature image, that is, the first parallax image between the first image to be processed and the repaired second image to be processed, can be determined. The repaired second to-be-processed image can be obtained through the technical scheme provided by the embodiment (a) and/or the technical scheme provided by the embodiment (b), so that the technical scheme provided by the embodiment can improve the accuracy of the parallax image obtained based on the first to-be-processed image and the second to-be-processed image after being combined with the technical scheme provided by the embodiment (a) and/or the technical scheme provided by the embodiment (b), that is, the technical scheme provided by the embodiment of the application can improve the accuracy of the parallax image obtained based on the binocular image under the condition that the image qualities of the two images in the binocular image are not consistent.

Since the feature extraction processing performed on the first image to be processed and the repaired second image to be processed in step 1001 may be convolution processing or pooling processing, and performing the convolution processing or pooling processing on both the first image to be processed and the repaired second image to be processed may reduce the resolution of both the first image to be processed and the repaired second image to be processed. Therefore, the resolution of the first parallax image obtained by embodiment (five) is smaller than the resolution of the first image to be processed (or the resolution of the repaired second image to be processed). Based on this, the embodiments of the present application further provide a technical solution for improving the resolution of the first parallax image, by which the resolution of the first parallax image can be improved to be the same as the resolution of the first image to be processed (or the same as the resolution of the repaired second image to be processed).

In one possible implementation manner of increasing the resolution of the first parallax image, the third feature image may be obtained by performing encoding processing on the first parallax image and the first feature image. And decoding the third characteristic image to obtain a second parallax image between the first to-be-processed image with the resolution greater than that of the first parallax image and the repaired second to-be-processed image.

The implementation of the above encoding process and decoding process can be referred to in step 702 and step 703, which will not be described herein again. Optionally, the encoding process and the decoding process may be implemented by a convolutional neural network with an improved resolution, and the convolutional neural network with an improved resolution may have an ability to improve the resolution of the input image by performing supervised training on the convolutional neural network with an improved resolution.

In the foregoing possible implementation manner of improving the resolution of the first parallax image, the performing encoding processing on the first parallax image and the first feature image to obtain the third feature image may include: and splicing the first characteristic image and the first image to be processed to obtain a fourth image to be processed. And coding the fourth image to be processed to obtain the third characteristic image.

Optionally, before the first parallax image and the first feature image are subjected to stitching processing to obtain a fourth image to be processed, feature extraction processing may be performed on the first feature image, and when the size of the first feature image is reduced to be the same as that of the first parallax image, features of the first feature image are extracted at the same time, so as to obtain a fourth feature image of the first feature image. And splicing the fourth characteristic image and the first parallax image to obtain the fourth image to be processed. Therefore, the data processing amount required by the subsequent processing of the fourth image to be processed can be reduced, and the processing speed is improved.

The embodiment of the application also provides a binocular image processing network which can be used for realizing the methods in the embodiments (one) to (five). The binocular image processing network comprises an image inpainting sub-network and a parallax image generating sub-network. Referring to fig. 11 and 12, fig. 11 is a schematic structural diagram of an image repairing sub-network provided in (six) of the present disclosure, and fig. 12 is a schematic structural diagram of a parallax image generating sub-network provided in (six) of the present disclosure. It should be understood that the network structures shown in fig. 11 and fig. 12 are only exemplary, and may be adjusted according to specific requirements in practical applications, which are not limited in the present application.

As shown in fig. 11, the input of the image inpainting sub-network is an image obtained by stitching (registering) the first to-be-processed image and the second to-be-processed image, that is, a third to-be-processed image. And after the third image to be processed is input into the image restoration sub-network, sequentially carrying out convolution processing on 3 layers of convolution layers and sequentially carrying out processing on 4 layers of coding layers to obtain a first intermediate characteristic image. The processing procedure of the data output by the convolutional layer by the 4 coding layers can be referred to in step 602, and will not be described herein again. As shown in fig. 13, the data input to the coding layer is sequentially processed by the down-sampling layer and the 3-layer convolutional layer to obtain the output data of the coding layer.

And the first intermediate characteristic data is sequentially processed by a 3-layer decoding layer and a 1-layer upper sampling layer to obtain a fourth intermediate characteristic image. The processing procedure of the first intermediate feature image by the 3 decoding layers can be referred to in step 603, and will not be described herein. As shown in fig. 14, the data input into the decoding layer is sequentially processed by the up-sampling layer and the 3-layer convolution layer to obtain the output data of the decoding layer. In order to enrich texture information and edge information in the feature image obtained by the decoding layer, in the process that the 3 decoding layers decode the input data and the up-sampling layer after the decoding layer performs up-sampling on the input data, the feature image output by the coding layer and the feature image output by the decoding layer (or the feature image output by the up-sampling layer) can be fused, that is, the feature image output by the coding layer and the feature image output by the up-sampling layer in the decoding layer are fused.

The fourth intermediate feature image is input into two different branches (i.e. the aforementioned "decoding branches"), which have the same structure and are composed of 4 convolutional layers and one depth guiding filter layer, but the parameters of the two branches are different. The fourth intermediate characteristic image is processed by one branch to obtain a horizontal parallax displacement characteristic image, and the fourth intermediate characteristic image is processed by the other branch to obtain a vertical parallax displacement characteristic image. The above processing procedure of the depth guidance filter layer on the input data may refer to an implementation procedure of performing filter processing on the horizontal parallax displacement feature image and/or the vertical parallax displacement feature image in step 503, and will not be described herein again.

After the horizontal parallax displacement characteristic image and the vertical parallax displacement characteristic image are respectively obtained through the two branches, cross multiplication can be carried out on the horizontal parallax displacement characteristic image and the vertical parallax displacement characteristic image, and a result obtained by the cross multiplication and a first image to be processed are subjected to point multiplication. The process is a process of performing convolution processing on the first image to be processed by using the horizontal parallax displacement feature image and the vertical parallax displacement feature image as convolution kernels respectively in step 503. And performing dot multiplication on the result obtained by the cross multiplication and the first image to be processed to obtain a repaired second image to be processed.

After the repaired second image to be processed is obtained, the repaired second image to be processed and the first image to be processed may be input to the parallax image generating sub-network. As shown in fig. 12, after the repaired second to-be-processed image and the repaired first to-be-processed image are input to the parallax image generation sub-network, the repaired second to-be-processed image and the repaired first to-be-processed image are respectively convolved by 4 layers of convolution layers, so as to respectively perform feature extraction processing on the first to-be-processed image and the repaired second to-be-processed image, and obtain a first feature image of the first to-be-processed image and a second feature image of the repaired second to-be-processed image.

By processing the first feature image and the second feature image via an association layer (correlation layer), a correlation between the first feature image and the second feature image can be determined. The processing procedure of the first feature image and the second feature image by the correlation layer (correlation layer) may refer to the implementation procedure of determining the correlation between the first feature image and the second feature image in step 902, and will not be described herein again. The first parallax image may be obtained by processing data output from the associated layers by the 4 convolutional layers.

In order to improve the resolution of the first parallax image to be the same as the resolution of the first image to be processed (or the resolution of the repaired second image to be processed), the fourth feature image is obtained by processing the second feature image through 4 layers of convolution layers, and then the fourth feature image is spliced with the first parallax image to obtain the fourth image to be processed. The fourth image to be processed is processed by one or more coding layers (fig. 13 shows 2 coding layers) to obtain a third feature image, and the third feature image is processed by one or more decoding layers to obtain a second parallax image.

It should be understood that the number of different network layers in the binocular image processing network described above is merely an example, and in actual use, it may be appropriately adjusted, for example, the convolutional layer in the decoding layer may be 4 layers. The number of network layers in the binocular image processing network is not limited.

The image restoration sub-network provided by the embodiment can be used for obtaining the restored second image to be processed based on the first image to be processed and the second image to be processed, so that the difference between the two images in the binocular images can be reduced under the condition that the image qualities of the two images in the binocular images are not consistent. The ability of the image inpainting sub-network to obtain the second to-be-processed image after inpainting based on the first to-be-processed image and the second to-be-processed image can be obtained through training of the image inpainting sub-network.

As can be seen from the embodiments (a) to (five), the repaired second image to be processed can be obtained by performing convolution processing on the first image to be processed using the horizontal parallax displacement feature image and/or the vertical parallax displacement feature image. Whether the image inpainting sub-network can obtain the horizontal parallax displacement characteristic image and the vertical parallax displacement characteristic image based on the first image to be processed and the second image to be processed or not will decide whether the image inpainting sub-network can obtain the inpainted second image to be processed based on the first image to be processed and the second image to be processed or not.

The horizontal parallax displacement characteristic image contains horizontal parallax displacement information between the homonymous points in the first image to be processed and the second image to be processed, and the vertical parallax displacement characteristic image contains vertical parallax displacement information between the homonymous points in the first image to be processed and the second image to be processed. In an ideal state, the horizontal parallax displacement information in the horizontal parallax displacement characteristic image should be the same as the real horizontal parallax displacement of the same-name point in the first image to be processed and the second image to be processed, and the vertical parallax displacement information in the vertical parallax displacement characteristic image should be the same as the real vertical parallax displacement of the same-name point in the first image to be processed and the second image to be processed. However, because the difficulty of obtaining the true horizontal parallax displacement and the true vertical parallax displacement between the first image to be processed and the second image to be processed is very high, the embodiment of the application adopts a mode of measuring the difference between the second image to be processed and the repaired second image to be processed to measure the repairing effect of the image repairing sub-network.

Because the shooting visual angle of the repaired second image to be processed is the same as the visual angle of the repaired second image to be processed, the position of the contour of the object in the second image to be processed is the same as the position of the contour of the object in the second image to be processed, the smaller the difference between the position of the contour of the object in the second image to be processed and the position of the contour of the object in the repaired second image to be processed is, and the smaller the difference between the pixel points at the same position in the second image to be processed and the repaired second image to be processed is. Therefore, the embodiment determines the first loss according to the difference between the pixel points at the same position in the second to-be-processed image and the repaired second to-be-processed image. The training of the image inpainting sub-network can then be supervised at a first loss, adjusting the parameters of the image inpainting sub-network. In one possible implementation manner, the difference between the pixel points at the same position in the second to-be-processed image and the repaired second to-be-processed image can be determined by measuring the 1-norm of the second to-be-processed image and the repaired second to-be-processed image.

Further, the smaller the difference between the position of the contour of the object in the second image to be processed and the position of the contour of the object in the restored second image to be processed, the smaller the difference in human vision between the restored second image to be processed and the second image to be processed. Optionally, the second loss may be determined according to the first loss and a difference between the repaired second image to be processed and the second image to be processed in human vision, and the second loss may further supervise training of the image repairing sub-network and adjust parameters of the image repairing sub-network. In one possible implementation, the second loss may be determined by:

wherein, I_RIs the second image to be processed and is,

the second image to be processed after the restoration. α is a natural number greater than 0 and less than 1, and optionally α is 0.84.

Is composed of

1-norm of (1).

Is composed of

And I_RMulti scale structural similarity index (MS-SSIM).

As described above

And

the multi-level structural similarity between the first image to be processed and the second image to be processed can be used for measuring the difference of the repaired second image to be processed in human vision.

And I_RMulti-hierarchy structural similarity package betweenIs arranged at

And I_RThe difference at different resolutions. For example,

resolution and I_RAll the resolutions of (1) are a, and are measured

And I_RThe difference therebetween obtains a first difference. By adjusting

Resolution and I_RThe resolution of (2) is adjusted

Resolution and I_RAll the resolutions of (1) are b, and the adjusted resolutions are measured

And adjusted I_RThe difference therebetween obtains a second difference. From the first difference and the second difference, a determination is made

And I_RThe multi-level structural similarity between the two, such as: taking the sum of the first difference and the second difference as

And I_RMulti-level structural similarity between them.

The smaller the difference between the position of the contour of the object in the second image to be processed and the position of the contour of the object in the repaired second image to be processed, the smaller the difference between the image semantics of the second image to be processed and the image semantics of the repaired second image to be processed. Optionally, a third loss may be determined by measuring a difference between an image semantic of the second to-be-processed image and an image semantic of the repaired second to-be-processed image, a total loss of the image repairing sub-network may be determined according to the second loss and the third loss, and the parameter of the image repairing sub-network may be adjusted by supervising training of the image repairing sub-network with the total loss. In one possible implementation, the difference between the image semantics of the second to-be-processed image and the image semantics of the repaired second to-be-processed image can be determined by measuring the difference between the feature image of the second to-be-processed image and the feature image of the repaired second to-be-processed image, and the process can be referred to as the following formula:

wherein the content of the first and second substances,

for inputting the restored second image to be processed into the characteristic image outputted from the j-th layer network in VGG-19, #_j(I_R) The repaired second image to be processed is input into a characteristic image output by a j-th layer network in the VGG-19.

Is composed of

2-norm of (d).

The number of channels of the characteristic image output by the layer j network in the VGG-19,

is the high, w, of the characteristic image output by the j-th network in VGG-19_jIs the width of the characteristic image output by the layer j network in VGG-19. Optionally, the j-th layer network in the VGG-19 is relu4_4 in the VGG-19.

After determining the second loss and the third loss, a total loss for the image inpainting subnetwork may be determined according to the following equation:

wherein β is a natural number greater than 0 and less than 1, and optionally β is 0.5.

After the total loss of the image inpainting sub-network is obtained through the formula (3), the image inpainting sub-network can be subjected to inverse gradient propagation based on the total loss, and the parameters of the image inpainting sub-network are updated to complete the training of the image inpainting sub-network.

The present embodiment also provides a training method of a parallax image generation sub-network, in which a loss of the parallax image generation sub-network (which will be referred to as a fourth loss hereinafter) can be determined by measuring a difference between a second parallax image obtained by the parallax image generation sub-network and a true parallax image. The real parallax image is a real parallax image between the first image to be processed and the second image to be processed. In one possible implementation, the fourth loss may be obtained by:

wherein the content of the first and second substances,

for the image of the second parallax image at the nth scale, d_nThe image of the real parallax image in the nth scale is obtained, wherein N is the number of scales, and N is a positive integer.

The above-mentioned scale includes a resolution, and the image of the second parallax image at the nth scale includes an image obtained by adjusting the resolution of the second parallax image to be the same as the nth resolution, and similarly, the image of the real parallax image at the nth scale includes an image obtained by adjusting the resolution of the second parallax image to be the same as the nth resolution.

After obtaining the fourth loss, parameters of the parallax image generation sub-network may be adjusted based on the fourth loss.

The present embodiment provides a binocular image processing network by which a restored second image to be processed can be obtained based on a first image to be processed and a second image to be processed, and a second parallax image can be obtained based on the restored second image to be processed and the first image to be processed. The embodiment also provides a training method for the binocular image processing network, which can improve the performance of the binocular image processing network by training the binocular image processing network, further reduce the difference between the obtained repaired second image to be processed and the second image to be processed, and improve the precision of the second parallax image.

Based on the image processing methods provided in embodiments (a) to (six), embodiment (seventh) of the present disclosure provides an application scenario that may be implemented.

With the rapid development of the configuration of the smart phone, the number of cameras loaded on the smart phone is increasing, such as a wide angle camera (wide) and a tele camera (tele). Taking a wide-angle camera and a telephoto camera as examples, a binocular image can be obtained by simultaneously shooting a scene or an object through the wide-angle camera and the telephoto camera, and a disparity map and a depth map can be obtained based on the binocular image. However, the hardware configuration of the wide-angle camera and the tele-camera is different, so that the image quality of two images in the obtained binocular images is different, and the accuracy of the obtained disparity map is low or the accuracy of the obtained depth map is low.

By adopting the technical scheme provided by the embodiment of the application, the binocular images acquired by the wide-angle camera and the telephoto camera can be repaired, so that the difference of the image quality of the two images in the binocular images is reduced, and the low precision of the obtained parallax image or the low precision of the obtained depth image is improved.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

The method of the embodiments of the present application is set forth above in detail and the apparatus of the embodiments of the present application is provided below.

Referring to fig. 15, fig. 15 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application, where the apparatus 1 includes: an acquisition unit 11, a first processing unit 12, a second processing unit 13, a convolution processing unit 14, an encoding processing unit 15, and a decoding processing unit 16, wherein:

the image processing device comprises an acquisition unit 11, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a binocular image, the binocular image comprises a first image to be processed and a second image to be processed, and the image quality of the first image to be processed is higher than that of the second image to be processed;

the first processing unit 12 is configured to obtain a repaired second image to be processed according to the first image to be processed and the second image to be processed, where an image viewing angle of the repaired second image to be processed is the same as an image viewing angle of the repaired second image to be processed, and an image quality of the repaired second image to be processed is higher than an image quality of the repaired second image to be processed;

the second processing unit 13 is configured to obtain a first parallax image between the first to-be-processed image and the repaired second to-be-processed image according to the first to-be-processed image and the repaired second to-be-processed image.

In one possible implementation, the first processing unit 12 is configured to: performing first feature extraction processing on the first image to be processed and the second image to be processed to obtain a horizontal parallax displacement feature image, wherein the horizontal parallax displacement feature image comprises first horizontal parallax displacement between a first pixel point in the first image to be processed and a second pixel point in the second image to be processed, and the first pixel point and the second pixel point are homonymous points;

In a possible implementation manner, the first processing unit 12 is further configured to, after the binocular image is obtained, perform second feature extraction processing on the first to-be-processed image and the second to-be-processed image to obtain a vertical parallax displacement feature image, where the vertical parallax displacement feature image includes vertical parallax displacement between the first pixel point and the second pixel point;

In another possible implementation manner, the first processing unit 12 is specifically configured to:

In yet another possible implementation manner, the first processing unit 12 is configured to:

In yet another possible implementation manner, the second processing unit 13 is configured to:

In yet another possible implementation manner, the apparatus further includes: a convolution processing unit 14, configured to perform convolution processing on a pixel neighborhood in the second feature image by using a pixel neighborhood in the first feature image as a convolution kernel before the first parallax image is obtained according to the correlation between the first feature image and the second feature image, and determine the correlation between the first feature image and the second feature image; or the like, or, alternatively,

In yet another possible implementation manner, the apparatus 1 further includes:

an encoding processing unit 15, configured to perform encoding processing on the first parallax image and the first feature image to obtain a third feature image;

a decoding processing unit 16, configured to perform decoding processing on the third feature image to obtain a second parallax image between the first to-be-processed image and the repaired second to-be-processed image, where a resolution of the second parallax image is greater than a resolution of the first parallax image.

In yet another possible implementation manner, the encoding processing unit 15 is configured to:

In yet another possible implementation manner, the feature extraction processing unit 12 is further configured to, before the stitching processing is performed on the first parallax image and the first feature image to obtain a fourth image to be processed, perform feature extraction processing on the first feature image to obtain a fourth feature image of the first feature image;

the encoding processing unit 15 is specifically configured to: and splicing the fourth characteristic image and the first parallax image to obtain a fourth image to be processed.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

Fig. 16 is a schematic diagram of a hardware structure of an image processing apparatus according to an embodiment of the present application. The image processing apparatus 2 includes a processor 21, a memory 22, an input device 23, and an output device 24. The processor 21, the memory 22, the input device 23 and the output device 24 are coupled by a connector, which includes various interfaces, transmission lines or buses, etc., and the embodiment of the present application is not limited thereto. It should be appreciated that in various embodiments of the present application, coupled refers to being interconnected in a particular manner, including being directly connected or indirectly connected through other devices, such as through various interfaces, transmission lines, buses, and the like.

The processor 21 may be one or more Graphics Processing Units (GPUs), and in the case that the processor 21 is one GPU, the GPU may be a single-core GPU or a multi-core GPU. Alternatively, the processor 21 may be a processor group composed of a plurality of GPUs, and the plurality of processors are coupled to each other through one or more buses. Alternatively, the processor may be other types of processors, and the like, and the embodiments of the present application are not limited.

Memory 22 may be used to store computer program instructions, as well as various types of computer program code for executing the program code of aspects of the present application. Alternatively, the memory includes, but is not limited to, Random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or compact disc read-only memory (CD-ROM), which is used for associated instructions and data.

The input means 23 are for inputting data and/or signals and the output means 24 are for outputting data and/or signals. The output device 23 and the input device 24 may be separate devices or may be an integral device.

It is understood that, in the embodiment of the present application, the memory 22 may be used to store not only the relevant instructions, but also the relevant images, for example, the memory 22 may be used to store the first to-be-processed image and the second to-be-processed image acquired through the input device 23, or the memory 22 may be used to store the repaired second to-be-processed image and the first parallax image acquired through the processor 21, and the like, and the embodiment of the present application is not limited to the data specifically stored in the memory.

It will be appreciated that fig. 16 shows only a simplified design of an image processing apparatus. In practical applications, the image processing apparatuses may further include other necessary components, including but not limited to any number of input/output devices, processors, memories, etc., and all image processing apparatuses that can implement the embodiments of the present application are within the scope of the present application.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It is also clear to those skilled in the art that the descriptions of the various embodiments of the present application have different emphasis, and for convenience and brevity of description, the same or similar parts may not be repeated in different embodiments, so that the parts that are not described or not described in detail in a certain embodiment may refer to the descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)), or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., Digital Versatile Disk (DVD)), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

One of ordinary skill in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer-readable storage medium, and when executed, may include the processes of the above method embodiments. And the aforementioned storage medium includes: various media that can store program codes, such as a read-only memory (ROM) or a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Claims

1. An image processing method, characterized in that the method comprises:

2. The method according to claim 1, wherein obtaining the repaired second image to be processed based on the first image to be processed and the second image to be processed comprises:

performing first feature extraction processing on the first image to be processed and the second image to be processed to obtain a horizontal parallax displacement feature image, wherein the horizontal parallax displacement feature image comprises first horizontal parallax displacement between a first pixel point in the first image to be processed and a second pixel point in the second image to be processed, and the first pixel point and the second pixel point are homonymous points;

3. The method of claim 2, wherein after the acquiring binocular images, the method further comprises:

4. The method according to claim 2, wherein performing convolution processing on the first image to be processed by using the horizontal parallax displacement characteristic image as a convolution kernel to obtain a repaired second image to be processed comprises:

5. The method according to any one of claims 2 to 4, wherein the performing a first feature extraction process on the first image to be processed and the second image to be processed to obtain a horizontal parallax displacement feature image comprises:

6. The method according to claim 5, wherein performing m-level first decoding processing on the first intermediate feature image to obtain the horizontal parallax displacement feature image comprises:

7. The method according to claim 5, wherein performing m-level first decoding processing on the first intermediate feature image to obtain the horizontal parallax displacement feature image comprises:

8. The method according to claim 2, wherein obtaining a first parallax image between the first image to be processed and the repaired second image to be processed according to the first image to be processed and the repaired second image to be processed comprises:

9. The method according to claim 8, wherein before the obtaining the first parallax image based on the correlation between the first feature image and the second feature image, the method further comprises:

10. The method of claim 8, further comprising:

11. The method according to claim 10, wherein the encoding the first parallax image and the first feature image to obtain a third feature image comprises:

12. The method according to claim 11, wherein before the stitching processing is performed on the first parallax image and the first feature image to obtain a fourth image to be processed, the method further comprises:

13. An image processing apparatus, characterized in that the apparatus comprises:

14. The apparatus of claim 13, wherein the first processing unit is configured to:

15. The apparatus according to claim 14, wherein the first processing unit is further configured to, after the binocular image is obtained, perform a second feature extraction process on the first to-be-processed image and the second to-be-processed image to obtain a vertical parallax displacement feature image, where the vertical parallax displacement feature image includes vertical parallax displacement between the first pixel point and the second pixel point;

16. The apparatus of claim 14, wherein the first processing unit is configured to:

17. The apparatus according to any of claims 14 to 16, wherein the first processing unit is configured to:

18. The apparatus of claim 17, wherein the first processing unit is configured to:

19. The apparatus of claim 17, wherein the first processing unit is configured to:

20. The apparatus of claim 14, wherein the second processing unit is configured to:

21. The apparatus of claim 20, further comprising:

a convolution processing unit, configured to perform convolution processing on a pixel neighborhood in the second feature image by using a pixel neighborhood in the first feature image as a convolution kernel before the first parallax image is obtained according to the correlation between the first feature image and the second feature image, and determine the correlation between the first feature image and the second feature image; or the like, or, alternatively,

22. The apparatus of claim 20, further comprising:

23. The apparatus of claim 22, wherein the encoding processing unit is configured to:

24. The apparatus according to claim 23, wherein the feature extraction processing unit is further configured to, before the stitching processing is performed on the first parallax image and the first feature image to obtain a fourth image to be processed, perform feature extraction processing on the first feature image to obtain a fourth feature image of the first feature image;

the encoding processing unit is specifically configured to:

25. A processor configured to perform the method of any one of claims 1 to 12.

26. An electronic device, comprising: a processor, transmitting means, input means, output means and a memory for storing computer program code comprising computer instructions which, when executed by the processor, cause the electronic device to perform the method of any of claims 1 to 12.

27. A computer-readable storage medium, in which a computer program is stored, the computer program comprising program instructions which, when executed by a processor of an electronic device, cause the processor to carry out the method of any one of claims 1 to 12.