CN115953585A

CN115953585A - Denoising method and device based on multi-viewpoint image fusion, electronic device and medium

Info

Publication number: CN115953585A
Application number: CN202310138477.5A
Authority: CN
Inventors: 桑新柱; 曾哲昊; 颜玢玢; 苑金辉; 陈铎; 王鹏
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2023-02-14
Filing date: 2023-02-14
Publication date: 2023-04-11

Abstract

The invention provides a denoising method, a denoising device, electronic equipment and a medium based on multi-viewpoint image fusion, wherein the method comprises the following steps: acquiring a target image in a three-dimensional light field scene by a camera array; classifying the noise based on the source of the noise in the target image, and determining the type of the noise; and denoising the target image based on the type of the noise and a pre-trained model. The denoising method based on multi-viewpoint image fusion provided by the invention divides the noise into three types aiming at the acquisition and transmission process of the real-time three-dimensional light field scene, and carries out denoising treatment on the noise in the acquisition and transmission process of the three types of three-dimensional light field scenes respectively according to the type of the noise and a pre-trained model, so that the speed is high, the occupied resource is less, the effect is considerable, and the real-time denoising of the ultra-high definition three-dimensional light field video is realized.

Description

Denoising method and device based on multi-viewpoint image fusion, electronic device and medium

Technical Field

The invention relates to the technical field of image processing, in particular to a denoising method and device based on multi-viewpoint image fusion, electronic equipment and a medium.

Background

The three-dimensional light field display technology is a naked eye three-dimensional display method and has the advantages of full parallax, capability of correctly displaying the shielding relation and the like. With the increasing application of three-dimensional light field display, more and more requirements are put forward, such as remote real-time acquisition, higher display resolution, better viewing experience, and the like.

When a three-dimensional light field display material is acquired, noise is inevitably introduced in the acquisition and transmission of a real scene, and the final display effect is directly influenced by the noise.

Commonly used image denoising techniques include linear and nonlinear filters such as mean, median and gaussian.

The linear filter has the advantages of simplicity and quickness, but has the disadvantage of poor effect on gaussian noise and salt and pepper noise. The nonlinear filter has the advantage of better effect on salt-and-pepper noise, but has the disadvantage of slow speed. The statistical denoising method has the advantage of better effect on Gaussian noise, but has the disadvantage of large calculation amount.

Disclosure of Invention

Aiming at the problems in the prior art, the embodiment of the invention provides a denoising method and device based on multi-viewpoint image fusion, an electronic device and a medium.

The invention provides a denoising method based on multi-viewpoint image fusion, which comprises the following steps:

acquiring a target image in a three-dimensional light field scene by a camera array;

classifying the noise based on the source of the noise in the target image, and determining the type of the noise;

and denoising the target image based on the type of the noise and a pre-trained model.

In some embodiments, the types of noise include: intra-camera noise, inter-camera noise, and noise during transmission.

In some embodiments, in the case that the type of the noise is intra-camera noise, denoising the target image based on the type of the noise and a pre-trained model includes:

training a first self-encoder by taking the first image as input and the second image as output to obtain a first denoising model; the first image is a noisy image, and the second image is a clean image;

and denoising the target image based on the first denoising model.

In some embodiments, in a case that the type of the noise is inter-camera noise, the denoising the target image based on the type of the noise and a pre-trained model includes:

determining a target color block based on a MacBeth color checker;

training a full-connection network by taking the target color block as a label to obtain a second denoising model;

and denoising the target image based on the second denoising model.

In some embodiments, in the case that the type of the noise is noise in a transmission process, denoising the target image based on the type of the noise and a pre-trained model includes:

training a second self-encoder by taking the third image as input and the fourth image as output to obtain a third denoising model; the code rate of the third image is smaller than a preset value, and the code rate of the fourth image is larger than or equal to the preset value;

and denoising the target image based on the third denoising model.

The invention also provides a denoising device based on multi-viewpoint image fusion, which comprises:

the acquisition module is used for acquiring a target image in a three-dimensional light field scene through a camera array;

the determining module is used for classifying the noise based on the source of the noise in the target image and determining the type of the noise;

and the denoising module is used for denoising the target image based on the type of the noise and a pre-trained model.

The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the denoising method based on the multi-viewpoint image fusion.

The present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a multi-viewpoint image fusion-based denoising method as any of the above.

The present invention also provides a computer program product comprising a computer program which, when executed by a processor, implements a multi-view image fusion-based denoising method as described in any of the above.

The denoising method, the denoising device, the electronic equipment and the medium based on the multi-viewpoint image fusion divide noise into three types aiming at the acquisition and transmission process of a real-time three-dimensional light field scene, perform denoising processing on the noise in the acquisition and transmission process of the three types of three-dimensional light field scenes respectively according to the type of the noise and a pre-trained model, are high in speed, small in occupied resource and considerable in effect, and realize real-time denoising of an ultra-high-definition three-dimensional light field video.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a denoising method based on multi-view image fusion according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a denoising device based on multi-view image fusion according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the application are capable of operation in other sequences than those illustrated or otherwise described herein, and that the terms "first" and "second" used herein generally refer to a class and do not limit the number of objects, for example, a first object can be one or more. In addition, "and/or" in the specification and the claims means at least one of connected objects, and a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.

Fig. 1 is a schematic flow diagram of a denoising method based on multi-view image fusion according to an embodiment of the present invention, and as shown in fig. 1, the denoising method based on multi-view image fusion according to an embodiment of the present invention includes:

step 101, collecting a target image in a three-dimensional light field scene through a camera array;

step 102, classifying the noise based on the source of the noise in the target image, and determining the type of the noise;

and 103, denoising the target image based on the type of the noise and a pre-trained model.

The executing body of the denoising method based on multi-viewpoint image fusion provided by the invention can be an electronic device, a component in the electronic device, an integrated circuit, or a chip. The electronic device may be a mobile electronic device or a non-mobile electronic device. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-mobile electronic device may be a server, a Network Attached Storage (NAS), a Personal Computer (PC), a Television (TV), a teller machine or a self-service machine, and the like, and the present invention is not particularly limited.

In step 101, a target image is acquired in a three-dimensional light field scene by a camera array.

The camera array is an array camera, namely, a shooting effect of replacing a large lens by a plurality of small lenses is achieved, and the principle of the array camera array is similar to that of an array astronomical telescope and compound eyes of insects. Compared with the traditional camera, the array camera has the advantages that the visual field is wider, the shot photos are larger, and the size is smaller.

The three-dimensional light field display material can be obtained by using the camera array, generally speaking, the more cameras, the denser the viewpoints, the more information, and the better the three-dimensional light field reconstruction effect.

The target image is a multi-viewpoint image acquired by a plurality of viewpoint cameras.

In the Real-Time acquisition and transmission process of a three-dimensional light field scene, when a Real scene is acquired by using a camera array, a video stream file is obtained and then transmitted through a Real Time Messaging Protocol (RTMP), noise is inevitably introduced in the process, and the final display effect is directly influenced by the noise.

In step 102, based on the source of the noise in the target image, the noise is classified, and the type of the noise is determined.

According to the source of the noise, the noise introduced in the acquisition and transmission process can be classified into three categories, including: intra-camera noise, inter-camera noise, and noise during transmission.

The noise in the camera, i.e., the noise generated in the camera, is mainly gaussian noise and poisson noise introduced during shooting.

The gaussian noise is generated due to heat generated by the image sensor during long-term operation and luminance unevenness during photographing. The poisson noise is also called shot noise, and is caused by the fact that the gray value fluctuates due to the fact that photons emitted by the light source are not received or are received a lot by the CMOS in the process of being transmitted to the CMOS due to certain factors.

Inter-camera noise, i.e., noise between cameras. Although cameras of the same model are used, individual differences are difficult to avoid, and conditions such as uneven illumination during shooting result in differences of the finally shot images, which are mainly expressed as chromatic aberration. This has a great influence on the effect of the final three-dimensional light field display.

In the process of transmission, because a plurality of ultra-high-definition videos are transmitted, the pressure on network bandwidth is large, and if the hardware condition is not enough, the quality of the videos is reduced, and the final display effect is also influenced.

In step 103, denoising the target image based on the type of the noise and a pre-trained model.

And sending the clean image and the image pair with the noise into a neural network for training to obtain a first denoising model of the noise in the camera.

And comparing the calibration result with the reference point by taking 24 color points on the MacBeth color checker as the same characteristics, and sending the calibration result to a neural network for training to obtain a second denoising model of the noise between the cameras.

The method comprises the steps of manufacturing an image data set with high code rate and low code rate matching, training by using a neural network to obtain a third denoising model of noise in the transmission process, and fitting a video frame with low code rate into a video frame with high code rate, so that the effect of transmitting a video with low code rate to reduce network pressure without influencing the display effect is achieved.

The denoising method based on multi-viewpoint image fusion provided by the embodiment of the invention divides the noise into three types aiming at the acquisition and transmission process of the real-time three-dimensional light field scene, and carries out denoising treatment on the noise in the acquisition and transmission process of the three types of three-dimensional light field scenes respectively according to the type of the noise and a pre-trained model, so that the method is high in speed, small in occupied resource and considerable in effect, and realizes the real-time denoising of the ultra-high definition three-dimensional light field video.

In some embodiments, in a case that the type of the noise is intra-camera noise, the denoising the target image based on the type of the noise and a pre-trained model includes:

and denoising the target image based on the first denoising model.

Alternatively, in training the first self-encoder, the training data may be obtained by:

the first method is that a camera is used for shooting a real scene to obtain a group of images as clean images, namely a second image is obtained;

and (3) on the basis of the obtained second image, respectively and artificially adding Gaussian noises with different variances by using opencv to obtain an image with noises, namely the first image is obtained.

The second method is that a camera is used for continuously shooting a plurality of images with short exposure time, such as 0.1s, on a single scene, because the exposure time is short, the obtained images have more noise points, namely the first image is obtained;

and then, adding the plurality of first images by using python, and dividing the added first images by the number of photos to obtain an image which is also a clean image without noise, namely the second image is obtained.

Training the obtained clean image and the corresponding image with noise as a data pair, wherein the neural network is a first self-encoder, the input of the network is the image with noise, the output of the network is a denoised image generated by the network, and the denoised image generated by the network is subtracted from the clean image in the data pair to obtain loss (loss) and then the back propagation is carried out for continuous training until the loss is converged. Namely, the first denoising model is obtained through training.

The first denoising model is used for denoising the noise in the camera of the target image, the image with Gaussian noise and/or Poisson noise is input, and the clean image after the Gaussian noise and/or Poisson noise is removed is output.

According to the denoising method based on multi-viewpoint image fusion provided by the embodiment of the invention, the self-encoder is trained by taking the clean image and the image with noise corresponding to the clean image as a data pair, so that the first denoising model is obtained, and the noise in the camera can be effectively removed.

In some embodiments, in the case that the type of the noise is inter-camera noise, denoising the target image based on the type of the noise and a pre-trained model includes:

determining a target color block based on a MacBeth color checker;

and denoising the target image based on the second denoising model.

The MacBeth color checker is a color calibration target consisting of 24 painted samples arranged on a cardboard frame. The patches of the chart have a spectral reflectance intended to mimic that of natural objects such as human skin, leaves and flowers, to achieve a consistent color appearance under various lighting conditions, especially as detected by typical color film, and remain stable over time.

The color checker Classic chart is a rectangular card having dimensions of about 11x8.25 inches (27.9x21.0 cm) or about 13x9 inches (33 x23 cm) in its original version, with an aspect ratio about the same as that of 35 mm film. It consists of 24 color blocks in a 4 x 6 grid, each block being slightly smaller than a 2 inch (5.1 cm) square, made of matte paint coated on smooth paper, surrounded by a black border. Six of the color patches form a uniform gray brightness level, and the other six patches are the primary colors typical in a chemical photographic process-red, green, blue, cyan, magenta, and yellow. The remaining colors include the approximate values of medium and dark human skin, blue sky, with the remainder being arbitrarily selected.

Alternatively, in training a fully connected network, the training data may be obtained by:

the first is to copy the standard RGB values of 24 color patches on the MacBeth color checker, respectively, and make 24 color patches by python, i.e., target color patches. The 24 color patches were used as labels.

The second method is to select a camera of one viewpoint from cameras of multiple viewpoints, shoot an image of the MacBeth color checker, and use a color block in the MacBeth color checker in the shot image, that is, a target color block, as a tag.

And comparing the calibration result with the reference point by using the obtained target color block as a label, and training the full-connection network to obtain a second denoising model.

The second denoising model is used for denoising the noise between the cameras in the target image, the input is an image with chromatic aberration shot by the viewpoint camera, and the output is an image with chromatic aberration removed and obtained by network calculation.

According to the denoising method based on multi-viewpoint image fusion provided by the embodiment of the invention, the full-connection network is trained to obtain the second denoising model according to the target color block determined by the MacBeth color checker as the label, so that the noise between cameras can be effectively removed.

In some embodiments, in a case that the type of the noise is noise in a transmission process, the denoising the target image based on the type of the noise and a pre-trained model includes:

and denoising the target image based on the third denoising model.

Optionally, when the second self-encoder is trained, the training data is a data set with a high bit rate and a low bit rate, and may be obtained in the following manner:

step 1, connecting Open Broadcast Software (OBS) with a camera to shoot a scene;

step 2, setting the video output of the OBS as a higher code rate, and storing the video acquired by the camera to the local;

step 3, pressing the stored video into a video with a lower code rate again by using a format factory;

and 4, respectively intercepting video frames at the same moment in the high-code-rate video and the low-code-rate video as data pairs to obtain a data set with high code rate and low code rate matching.

Understandably, the code rate of the video frame in the high-code-rate video intercepted at the same moment is higher, namely the video frame is the fourth image; and the code rate of the video frame in the intercepted low-code-rate video is lower, namely the video frame is the third image. And taking the video frames intercepted at the same moment as a data pair, namely acquiring a data set with high code rate and low code rate matching.

And after a data set with high code rate and low code rate matching is obtained, taking the video frame with low code rate as network input, taking the video frame with high code rate as a label, training a second self-encoder, and obtaining a third denoising model.

The third denoising model is used for denoising the noise in the transmission process, and the input video frame is a low-bit-rate video frame and the output video frame is a high-bit-rate video frame.

According to the denoising method based on multi-view image fusion provided by the embodiment of the invention, the image data set with high code rate and low code rate matching is manufactured, the neural network is used for training to obtain the denoising model, and the video frame with low code rate is fitted into the video frame with high code rate, so that the effect of transmitting the video with low code rate to reduce the network pressure is achieved, and the display effect is not influenced.

In addition, in the workflow of three-dimensional light field acquisition and transmission, a receiving end can be a three-dimensional light field player written by the receiving end, the pulling, real-time decapsulation decoding, real-time multi-view encoding and displaying of the real-time video stream can be realized, and the denoising model is embedded into the player on the basis of the functions so as to achieve the effect of processing noise in real time.

The interface of the three-dimensional light field player is written by using QT, and the interface comprises a video rendering interface (two versions, one version is used for playing a single video, and the other version is used for synchronously playing three videos), a playing progress bar (capable of adjusting and decoding the content of the corresponding moment in real time), a playing pause button, a full-screen playing button (also capable of double-clicking the video interface to achieve the full-screen effect), a local file playing button, a network video stream pulling button and a signal slot for adjusting light field parameters. The player is written by using C + + language and comprises various libraries such as FFMPEG, OPENCV, OPENGL, CUDA and the like, so that real-time playing of the ultra-high-definition video and multi-viewpoint coding are realized, and software support is provided for the three-dimensional light field display equipment.

The following describes the denoising device based on multi-view image fusion according to the present invention, and the denoising device based on multi-view image fusion described below and the denoising method based on multi-view image fusion described above may be referred to in correspondence with each other.

Fig. 2 is a schematic structural diagram of a denoising device based on multi-view image fusion according to an embodiment of the present invention, and as shown in fig. 2, the denoising device based on multi-view image fusion according to an embodiment of the present invention includes:

an acquisition module 210 for acquiring a target image in a three-dimensional light field scene by means of a camera array;

a determining module 220, configured to classify noise based on a source of the noise in the target image, and determine a type of the noise;

a denoising module 230, configured to denoise the target image based on the type of the noise and a pre-trained model.

It should be noted that the denoising device based on multi-view image fusion provided in the embodiment of the present application can implement all the method steps implemented by the denoising method based on multi-view image fusion, and can achieve the same technical effects, and detailed descriptions of the same parts and beneficial effects as those of the method embodiment in this embodiment are omitted here.

Optionally, the type of noise includes: intra-camera noise, inter-camera noise, and noise during transmission.

Optionally, in a case that the type of the noise is intra-camera noise, the denoising module 230 is specifically configured to:

and denoising the target image based on the first denoising model.

Optionally, in a case that the type of the noise is inter-camera noise, the denoising module 230 is specifically configured to:

determining a target color block based on a MacBeth color checker;

and denoising the target image based on the second denoising model.

Optionally, in a case that the type of the noise is noise in a transmission process, the denoising module 230 is specifically configured to:

and denoising the target image based on the third denoising model.

Fig. 3 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 3: a processor (processor) 310, a communication Interface (communication Interface) 320, a memory (memory) 330 and a communication bus 340, wherein the processor 310, the communication Interface 320 and the memory 330 communicate with each other via the communication bus 340. The processor 310 may call logic instructions in the memory 330 to perform a de-noising method based on multi-view image fusion, the method comprising: acquiring a target image in a three-dimensional light field scene by a camera array; classifying the noise based on the source of the noise in the target image, and determining the type of the noise; and denoising the target image based on the type of the noise and a pre-trained model.

In addition, the logic instructions in the memory 330 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, the computer program product comprising a computer program, the computer program being stored on a non-transitory computer-readable storage medium, wherein when the computer program is executed by a processor, the computer is capable of executing the multi-view image fusion based denoising method provided by the above methods, the method comprising: acquiring a target image in a three-dimensional light field scene by a camera array; classifying the noise based on the source of the noise in the target image, and determining the type of the noise; and denoising the target image based on the type of the noise and a pre-trained model.

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program, which when executed by a processor, implements a multi-view image fusion-based denoising method provided by the above methods, the method including: acquiring a target image in a three-dimensional light field scene by a camera array; classifying the noise based on the source of the noise in the target image, and determining the type of the noise; and denoising the target image based on the type of the noise and a pre-trained model.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on the understanding, the above technical solutions substantially or otherwise contributing to the prior art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the various embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A denoising method based on multi-view image fusion is characterized by comprising the following steps:

2. The multi-view image fusion based denoising method of claim 1, wherein the types of noise comprise: intra-camera noise, inter-camera noise, and noise during transmission.

3. The denoising method based on multi-viewpoint image fusion according to claim 2,

under the condition that the type of the noise is noise in a camera, denoising the target image based on the type of the noise and a pre-trained model, comprising:

and denoising the target image based on the first denoising model.

4. The denoising method based on multi-viewpoint image fusion according to claim 2,

under the condition that the type of the noise is inter-camera noise, denoising the target image based on the type of the noise and a pre-trained model, comprising:

determining a target color block based on a MacBeth color checker;

and denoising the target image based on the second denoising model.

5. The denoising method based on multi-viewpoint image fusion according to claim 2,

under the condition that the type of the noise is the noise in the transmission process, denoising the target image based on the type of the noise and a pre-trained model, comprising:

and denoising the target image based on the third denoising model.

6. A denoising device based on multi-viewpoint image fusion is characterized by comprising:

7. The de-noising apparatus based on multi-view image fusion according to claim 6, wherein the types of noise include: intra-camera noise, inter-camera noise, and noise during transmission.

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the denoising method based on multi-view image fusion according to any one of claims 1 to 5 when executing the program.

9. A non-transitory computer-readable storage medium on which a computer program is stored, wherein the computer program, when executed by a processor, implements the multi-view image fusion-based denoising method according to any one of claims 1 to 5.

10. A computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the multi-view image fusion based denoising method according to any one of claims 1 to 5.