CN114283087A

CN114283087A - Image denoising method and related equipment

Info

Publication number: CN114283087A
Application number: CN202111591688.1A
Authority: CN
Inventors: 洪国伟; 丁嘉文; 董治; 姜涛
Original assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Current assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Priority date: 2021-12-23
Filing date: 2021-12-23
Publication date: 2022-04-05

Abstract

The embodiment of the application discloses an image denoising method and related equipment, wherein the method comprises the following steps: identifying the target scene category of a plurality of frames of input images by using a pre-trained neural network, wherein the plurality of frames of input images are obtained by extracting frames of a video in a live broadcast scene; the neural network is obtained by training with a training sample set, wherein the training sample set comprises at least one group of training samples, and each group of training samples comprises labeling information of scene categories; obtaining a target noise estimation value interval corresponding to the target scene type; and denoising the video in the live broadcast scene according to the target noise estimation value interval to obtain a denoised video. The method and the device can avoid the problem of poor denoising effect caused by the adoption of unified denoising strength for different scenes.

Description

Image denoising method and related equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image denoising method and related apparatus.

Background

At present, a denoising algorithm for an image commonly uses gaussian filtering, median filtering, a denoising algorithm based on a neural network, and the like, wherein the gaussian filtering is a process of performing weighted average on the whole image, and the closer the point is, the higher the weight is, the farther the point is, the smaller the weight is, and thus, the image obtained after denoising may cause pixel shift. The median filtering is to replace other pixels by adopting a median, and mainly is to process scattering noise more ideally and cannot process gaussian noise better, so that the performance of the median filtering is poor. After the neural network-based denoising algorithm is used for many times in a training model, noise which does not exist in an original image exists in an image obtained through denoising.

Therefore, how to improve the denoising effect of the image is an urgent problem to be solved in the live broadcasting process.

Disclosure of Invention

The embodiment of the application provides an image denoising method and related equipment, which can effectively denoise the noise of an image.

On one hand, the embodiment of the application discloses an image denoising method, which comprises the following steps: identifying the target scene category of a plurality of frames of input images by using a pre-trained neural network, wherein the plurality of frames of input images are obtained by extracting frames of a video in a live broadcast scene; the neural network is obtained by training with a training sample set, wherein the training sample set comprises at least one group of training samples, and each group of training samples comprises labeling information of scene categories; obtaining a target noise estimation value interval corresponding to the target scene type; and denoising the video in the live broadcast scene according to the target noise estimation value interval to obtain a denoised video.

In an alternative embodiment, the identifying the target scene class of the input image by using the neural network completed by pre-training includes: carrying out feature extraction on an input image by using a neural network trained in advance to obtain a feature map; determining the scene category to which the input image belongs and the probability of the scene category to which the input image belongs according to the feature map; and determining the target scene category of the input image according to the scene category to which the feature map belongs and the probability of the scene category to which the feature map belongs.

In an optional implementation manner, the obtaining a target noise estimation value interval corresponding to the target scene class includes: carrying out noise estimation on the input image of each frame by using a noise estimation method to obtain a target noise estimation value of the input image of each frame; and obtaining a target noise estimation value interval corresponding to the target scene type according to the target noise estimation value of each frame of the input image.

In an optional implementation manner, the obtaining a target noise estimation value interval corresponding to the target scene class includes: and determining the noise estimation value interval corresponding to the target scene type as a target noise estimation value interval based on the preset corresponding relation between the scene type and the noise estimation value interval.

In an optional implementation manner, the denoising processing on the video in the live broadcast scene according to the target noise estimation value interval to obtain a denoised video includes: setting the standard variance of a color space in a spatial proximity function of a bilateral filter based on the target noise estimation value interval; and denoising the video in the live broadcast scene by using the set bilateral filter to obtain a denoised video.

In an optional implementation manner, the target scene category of the input image is one of a first scene, a second scene, a third scene and a fourth scene, wherein a noise estimation value in a target noise estimation value interval corresponding to the fourth scene is greater than a noise estimation value in a target noise estimation value interval corresponding to the third scene; the noise estimation value in the target noise estimation value interval corresponding to the third scene is larger than the noise estimation value in the target noise estimation value interval corresponding to the second scene; and the noise estimation value in the target noise estimation value interval corresponding to the second scene is larger than the noise estimation value in the target noise estimation value interval corresponding to the first scene.

In an optional embodiment, when the target scene category is the first scene, the range (a, B) of the standard deviation is a range (0.05, 0.2); when the target scene category is the second scene, the value intervals (A, B) of the standard deviation are intervals (0, 0.12); when the target scene type is the third scene, the value intervals (A, B) of the standard deviation are intervals (0.2, 0.4); when the target scene type is the fourth scene, the value intervals (a, B) of the standard deviation are intervals (0.3, 0.6).

In one aspect, an embodiment of the present application discloses an image denoising device, including:

the identification unit is used for identifying the target scene category of a plurality of frames of input images by utilizing a pre-trained neural network, wherein the plurality of frames of input images are obtained by extracting frames from a video in a live broadcast scene; the neural network is obtained by training with a training sample set, wherein the training sample set comprises at least one group of training samples, and each group of training samples comprises labeling information of scene categories;

the processing unit is used for obtaining a target noise estimation value interval corresponding to the target scene type;

and the denoising unit is used for denoising the video in the live broadcast scene according to the target noise estimation value interval to obtain a denoised video.

In an optional implementation manner, the identifying unit is configured to identify a target scene category of a multi-frame input image by using a pre-trained neural network, where the multi-frame input image is obtained by performing frame extraction on a video in a live broadcast scene; the neural network is obtained by training with a training sample set, wherein the training sample set comprises at least one group of training samples, and each group of training samples comprises labeling information of scene categories.

In an optional implementation manner, the processing unit is configured to obtain a target noise estimation value interval corresponding to the target scene type.

In an optional implementation manner, the denoising unit is configured to perform denoising processing on the video in the live broadcast scene according to the target noise estimation value interval, so as to obtain a denoised video.

In an optional implementation manner, the identifying unit, when recognizing the target scene category of the input image by using the neural network trained in advance, is specifically configured to: carrying out feature extraction on an input image by using a neural network trained in advance to obtain a feature map; determining the scene category to which the input image belongs and the probability of the scene category to which the input image belongs according to the feature map; and determining the target scene category of the input image according to the scene category to which the feature map belongs and the probability of the scene category to which the feature map belongs.

In an optional implementation manner, when obtaining the target noise estimation value interval corresponding to the target scene type, the processing unit is specifically configured to: carrying out noise estimation on the input image of each frame by using a noise estimation method to obtain a target noise estimation value of the input image of each frame; and obtaining a target noise estimation value interval corresponding to the target scene type according to the target noise estimation value of each frame of the input image.

In an optional implementation manner, when obtaining the target noise estimation value interval corresponding to the target scene type, the processing unit is specifically configured to: and determining the noise estimation value interval corresponding to the target scene type as a target noise estimation value interval based on the preset corresponding relation between the scene type and the noise estimation value interval.

In an optional implementation manner, when performing denoising processing on a video in the live broadcast scene according to the target noise estimation value interval to obtain a denoised video, the denoising unit is specifically configured to: setting the standard variance of a color space in a spatial proximity function of a bilateral filter based on the target noise estimation value interval; and denoising the video in the live broadcast scene by using the set bilateral filter to obtain a denoised video.

In an optional implementation manner, when the scene type used for identification is one of the first scene, the second scene, the third scene and a fourth scene, the noise estimation value in the target noise estimation value interval corresponding to the fourth scene is greater than the noise estimation value in the target noise estimation value interval corresponding to the third scene; the noise estimation value in the target noise estimation value interval corresponding to the third scene is larger than the noise estimation value in the target noise estimation value interval corresponding to the second scene; and the noise estimation value in the target noise estimation value interval corresponding to the second scene is larger than the noise estimation value in the target noise estimation value interval corresponding to the first scene.

In an alternative embodiment, when the scene type used for recognizing the input image is the first scene, the recognition unit sets the range (a, B) of the standard deviation to be the range (0.05, 0.2); when the target scene category is the second scene, the value intervals (A, B) of the standard deviation are intervals (0, 0.12); when the target scene type is the third scene, the value intervals (A, B) of the standard deviation are intervals (0.2, 0.4); when the target scene type is the fourth scene, the value intervals (a, B) of the standard deviation are intervals (0.3, 0.6).

The embodiment of the present application also discloses an image processing apparatus, including:

the image denoising method comprises a memory and a processor, wherein the memory stores an image processing program, and the image processing program realizes the steps of the image denoising method when being executed by the processor.

The embodiment of the application also discloses a computer readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the image denoising method is executed.

Accordingly, the present application also discloses a computer program product or a computer program, which includes computer instructions, which are stored in a computer readable storage medium. The processor of the image processing apparatus reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the image processing apparatus executes the image denoising method described above.

Therefore, in the image denoising method provided by the application, the neural network is utilized to determine the scene classification of the input image from a plurality of scene categories; and denoising the input image according to the determined scene category to obtain a denoised video. Therefore, the image denoising method utilizes the classified scenes to perform differential denoising on the input image, so that the problem of poor denoising effect caused by uniform denoising strength applied to different scenes is avoided, and the image denoising effect is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a block diagram of an image processing system according to an embodiment of the present disclosure;

FIG. 2 is a flowchart illustrating a denoising method according to an identified target scene category according to an embodiment of the present disclosure;

FIG. 3 shows a schematic diagram of an extracted feature map;

FIG. 4 shows an architecture diagram of a C3D network;

FIG. 5 is a diagram illustrating a result of recognizing an image using the neural network according to an embodiment of the present disclosure;

FIG. 6 shows a graph of noise estimation results for four scene classes;

FIG. 7 shows different scene diagrams when live;

FIG. 8 is a flowchart illustrating another denoising method according to the identified target scene category according to the present disclosure;

FIG. 9 is a comparison graph before and after image denoising in a diffuse reflection scene disclosed in an embodiment of the present application;

FIG. 10 is a comparison graph before and after image denoising in a foreground illumination scene disclosed in an embodiment of the present application;

FIG. 11 is a comparison graph before and after image denoising in a background illumination scene disclosed in an embodiment of the present application;

fig. 12 is a comparison graph before and after image denoising in a non-illumination scene disclosed in an embodiment of the present application;

fig. 13 is a schematic structural diagram of an image processing apparatus disclosed in an embodiment of the present application;

fig. 14 is a schematic structural diagram of an image processing apparatus disclosed in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

At present, there are generally three denoising methods for images: the image denoising method based on Gaussian filtering is characterized in that the pixel values of the whole image are weighted and averaged, and the value of each pixel point is obtained by carrying out weighted average on the value of the pixel point and the pixel values of other pixel points in the neighborhood. However, in the calculation process of performing weighted average on the pixel values of the current pixel points, the pixel values of the pixel points closer to the current pixel points are given with greater weight, and the pixel values of the pixel points farther from the current pixel points are given with smaller weight, so that the pixel points of the image obtained after denoising are shifted, and the denoising effect is not ideal. The second method is an image denoising method based on median filtering, which uses a rectangle surrounding the current pixel point and replaces other pixel points in the rectangular region with the median in the median filtering function, however, because gaussian noise is a type of noise subject to gaussian distribution (i.e. normal distribution), and the difference between the gaussian noise and the surrounding pixel values is usually small, the processing of gaussian noise is not ideal; the third method is a neural network-based denoising method, after a training model is used for multiple times, noise which does not exist in an original image exists in an image obtained by denoising, and the noise value is assumed to be determined in the existing neural network denoising method, however, the noise value of the image is not determined in fact, so the effect of the simple neural network denoising method is not good.

Therefore, how to improve the denoising effect of the image is an urgent problem to be solved.

The embodiment of the application provides an image denoising method which can improve the image denoising effect. The image denoising method utilizes a pre-trained neural network to identify the target scene category of a plurality of frames of input images, wherein the plurality of frames of input images are obtained by extracting frames from a video in a live broadcast scene; the neural network is obtained by training by utilizing a training sample set, wherein the training sample set comprises at least one group of training samples, and each group of training samples comprises labeling information of scene categories; obtaining a target noise estimation value interval corresponding to the target scene type; and denoising the video in the live broadcast scene according to the target noise estimation value interval to obtain a denoised video. Therefore, the image denoising method utilizes the classified scenes to perform differential denoising on the input image, so that the problem of poor denoising effect caused by uniform denoising strength applied to different scenes is avoided, and the image denoising effect is improved.

In the image denoising method provided by the embodiment of the application, the neural network can be realized based on an Artificial Intelligence (AI) technology. AI refers to a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. The AI technology is a comprehensive subject and relates to a wide range of fields; the neural network in the image denoising method provided by the embodiment of the present application mainly relates to a Machine Learning (ML) technique in an AI technique. The machine learning technology is a multi-field cross subject and relates to a plurality of subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning generally includes techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

In the image denoising method provided in the embodiment of the present application, the neural network may also be implemented based on a Computer Vision technology (CV) in an artificial intelligence technology. Computer Vision technology (CV) is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or is transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.

For ease of understanding, the present embodiment describes a network architecture including a database 101 and an image processing apparatus 102. The database 101 may be a local database of the image processing device 102, or may be a database of other devices in the cloud. The image denoising method may be executed by the image processing device 102, specifically, the image processing device 102 obtains a plurality of frames of input images from the database 101, and identifies and classifies scene categories of the input images by using a pre-trained neural network to determine the scene categories of the input images. The input image may be an image extracted from a video in a frame-extracting manner, including an image extracted from a live video (e.g., a live video of singing), or a photo taken by a shooting device. After the input image is identified and classified by using the neural network and the scene category of the input image is determined, an image denoising method is used for denoising an image corresponding to the scene category to which the input image belongs to obtain a denoised video, wherein the image denoising method can be a Gaussian filtering method, a median filtering method, a neural network method, a bilateral filtering method and the like. Therefore, the image denoising method can avoid the problem of poor denoising effect caused by uniform denoising strength applied to different scenes, and the denoising method adopting the method can improve the denoising effect of the image.

It should be noted that the image processing device 102 may be a terminal device or a server, and the terminal device may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a smart car, or the like, but is not limited thereto; the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN, and a big data and artificial intelligence platform.

The embodiments of the present application will be described in detail below with reference to the accompanying drawings.

Referring to fig. 2, a flowchart of an image denoising method disclosed in an embodiment of the present application is shown, where the image processing method may be executed by the image processing apparatus shown in fig. 1, and includes, but is not limited to, the following steps:

s201, recognizing the target scene category of a plurality of frames of input images by using a pre-trained neural network, wherein the plurality of frames of input images are obtained by extracting frames from a video in a live broadcast scene; the neural network is obtained by training with a training sample set, wherein the training sample set comprises at least one group of training samples, and each group of training samples comprises labeling information of scene categories.

In the embodiment of the present application, the multi-frame input image may be an image extracted from a frame extracted from a video, including an image extracted from a live video (for example, a live video of singing), or may be a photo taken by a shooting device. For example, a user takes a picture with a shooting device of a client, and sends the taken picture to an image processing apparatus, which takes the picture as an input image. The image extracted from the video frame may be an image extracted from a video frame (for example, continuously extracting 16 frames), or an image extracted from a video frame (for example, intermittently extracting 8 frames), which is not limited in this application.

In an alternative embodiment, the identifying the target scene category of the input image by using the neural network trained in advance comprises: carrying out feature extraction on an input image by using a neural network trained in advance to obtain a feature map; determining the scene category to which the input image belongs and the probability of the scene category to which the input image belongs according to the feature map; and determining the target scene category of the input image according to the scene category to which the feature map belongs and the probability of the scene category to which the feature map belongs.

Illustratively, since a three-dimensional convolutional neural network (3D ConvNets) can better model the time sequence information of the image, the neural network may be 3D ConvNets, and the process of obtaining the feature map by performing feature extraction on the input image extracted from the scene video by using the 3D ConvNets may be as shown in fig. 3. As shown in fig. 3, the input side (input) adjusts the size of the input image to a preset size (for example, 128 × 127) and introduces the input image into the neural network, and performs convolution operation on the input image according to a preset mode by using a convolution kernel in the neural network, where H is 127, W is 128, L is 128, H is a time depth, k and d are the sizes of convolution kernels, and the size of the convolution kernels is smaller than the size of the input image, and the preset mode is an arrow moving mode in the figure. The output (output) -feature map is obtained by convolution operation in the above manner.

Referring to fig. 4, fig. 4 is a schematic diagram of a network structure of a 3D ConvNets according to an embodiment of the present application. As shown in fig. 4, the 3D ConvNets are composed of three-dimensional convolution layers, three-dimensional maximum pooling layers, and full-connected layers, and include 8 3D convolution layers, 5D down-sampling layers, 2 full-connected layers, and 1 softmax activation output layer, wherein the convolution kernel size of all 3D ConvNets is 3 × 3, and the step size is 1 × 1; setting pool nucleus size as 1 x 2, step size as 1 x 2; all the other 3D pooling layers were 2 x 2 with step size of 2 x 2; 4096 output units are arranged on the full connection layer, and the four-classification recognition is carried out on the scenes by using a softmax activating function.

For example, fig. 5 is a schematic diagram of a result of recognizing an image by using the neural network according to an embodiment of the present application, as shown in fig. 5, an image processing apparatus imports an input image into a network 3D ConvNets to perform training and classification, an obtained classification result is shown in a recognition result diagram in fig. 5, an upper letter of the recognition result diagram is a classification result corresponding to the frame of picture, a lower row of the letter is a probability of the obtained classification result, and as can be seen from fig. 5, a recognition probability of a scene category to which the input image belongs is 0.7689, so that the 3D ConvNets can perform effective recognition and classification on the input image. The process of training and classifying and recognizing the input images by using the 3D ConvNet can refer to the following steps: assume that 16 frames of images are selected for import into the 3D ConvNets for training, each frame of images is resized to 128 x 171, then using the SGD optimizer, the blocksize is set to 30, the initial learning rate is set to 0.003, every 150k iterations is divided by 2, the epoch is set to 13. The same processing may be applied to the data during the test phase as during the training phase.

By effectively classifying and identifying the images corresponding to the scene type to which the input image belongs, the scene type to which the input image belongs can be determined, and then the step of obtaining the target noise estimation value interval corresponding to the target scene type in S202 is performed.

And S202, obtaining a target noise estimation value interval corresponding to the target scene type.

In the embodiment of the application, the image processing equipment can utilize a noise estimation algorithm to carry out noise estimation on an input image to obtain a target noise estimation value of the input image; and obtaining a target noise estimation value interval corresponding to the target scene type according to the target noise estimation value of the input image.

For example, the noise estimation method may be a recursive average noise estimation algorithm or a minimum controlled recursive average algorithm, or the like. For example, the image processing apparatus may estimate an image corresponding to a scene class to which the input image belongs using a recursive average noise estimation algorithm to obtain a target noise estimation value interval.

In an optional implementation manner, the image processing apparatus may further determine, based on a preset correspondence between the scene type and the noise estimation value interval, the noise estimation value interval corresponding to the target scene type as a target noise estimation value interval.

For example, assume that the target scene category of the input image is one of a first scene, a second scene, a third scene, and a fourth scene, where a noise estimation value in a target noise estimation value interval corresponding to the fourth scene may be greater than a noise estimation value in a target noise estimation value interval corresponding to the third scene; the noise estimation value in the target noise estimation value interval corresponding to the third scene may be greater than the noise estimation value in the target noise estimation value interval corresponding to the second scene; the noise estimate value in the target noise estimate value interval corresponding to the second scene may be greater than the noise estimate value in the target noise estimate value interval corresponding to the first scene.

Optionally, the estimating, by the image processing apparatus, an image corresponding to a scene class to which the input image belongs by using a noise estimation method to obtain a target noise estimation value interval, may include: the method comprises the steps that an image processing device extracts frames from a live scene to obtain an input image; carrying out noise estimation on the input image by using a noise estimation method to obtain a target noise estimation value of the input image; and determining a target noise estimation value interval according to the target noise estimation value of the image.

For example, please refer to fig. 6, which is a schematic diagram of noise estimation values of ten groups of input images provided in the embodiment of the present application, where the ten groups of input images shown in fig. 6 may be obtained by frame extraction from different live scenes, and as shown in fig. 6, the abscissa sequentially represents a first group to a tenth group of input images from left to right, where each group of input images includes four images, which are labeled as a first image, a second image, a third image, and a fourth image; the ordinate is a noise estimation value obtained by performing noise estimation on each image in each group of input images by using a noise estimation method. As shown in fig. 6, the noise estimation values of the images in the first group of input images are (0.22,0.31,0.48,0.90), the noise estimation values of the images in the second group of input images are (0.21,0.33,0.52,0.94), the noise estimation values of the images in the third group of input images are (0.25,0.32,0.57,0.91), the noise estimation values of the images in the fourth group of input images are (0.23,0.36,0.58,0.92), the noise estimation values of the images in the fifth group of input images are (0.24,0.38,0.58,0.93), the noise estimation values of the images in the sixth group of input images are (0.24,0.38,0.61,0.91), the noise estimation values of the images in the seventh group of input images are (0.24,0.37,0.61,0.81), the noise estimation values of the images in the eighth group of input images are (0.92, 0.25, 0.44), the ninth group of input images are (0.25, 0.44), 0.47,0.62,0.81), the noise estimate values of the images in the tenth set of input images are (0.30,0.42,0.78,0.94), respectively. As shown in fig. 6, the noise estimation values of all the first images in the ten sets of input images are located in the noise estimation value interval (0.20,0.31), the noise estimation values of all the second images in the ten sets of input images are located in the noise estimation value interval (0.32,0.49), the noise estimation values of all the third images in the ten sets of input images are located in the noise estimation value interval (0.48,0.78), and the noise estimation values of all the fourth images in the ten sets of input images are located in the noise estimation value interval (0.80,0.94), that is, the target noise estimation value intervals of the four input images are obtained.

Correspondingly, the image processing equipment determines the target noise estimation value interval of the input image as the target noise estimation value interval according to the preset corresponding relation between the target noise estimation value intervals of the four input images and the scene type. For example, the image processing apparatus analyzes the first image to the fourth image in the ten groups of input images in the four noise estimation value intervals obtained in fig. 6, and may obtain four scene types, which are the first scene to the fourth scene described above, respectively, that is, the noise estimation value in the target noise estimation value interval corresponding to the fourth image in the fourth scene is greater than the noise estimation value in the target noise estimation value interval corresponding to the third image in the third scene; the noise estimation value in the target noise estimation value interval corresponding to the third image in the third scene is larger than the noise estimation value in the target noise estimation value interval corresponding to the second image in the second scene; the noise estimate value in the target noise estimate value interval corresponding to the second image in the second scene may be greater than the noise estimate value in the target noise estimate value interval corresponding to the first image in the first scene.

Noise estimation is performed on the image corresponding to the identified scene category to obtain a target noise estimation value interval, and then the step of performing denoising processing on the video in the live broadcast scene according to the target noise estimation value interval in S203 to obtain a denoised video can be performed.

S203, denoising the video in the live broadcast scene according to the target noise estimation value interval to obtain a denoised video.

In the embodiment of the present application, the image denoising process may refer to a process of reducing noise in a digital image, and the process of reducing noise in the digital image may be based on some denoising methods, for example, a denoising method based on a bilateral filter, a denoising method based on gaussian filter, and the like. Wherein, the setting of the model parameters may comprise: the optimizer is set to random gradient descent (SGD), batchsize is set to 30, initial learning rate is set to 0.003, epochs is equal to 13, and so on; in addition, the training sample set may be obtained from the video in the live scene by a sampling operation of extracting one frame every three frames, and the size of each frame is adjusted to 128 × 171, and the size of the convolution kernel is set to 3 × 3.

In an optional implementation manner, the image processing device may perform denoising processing on the input image by using a bilateral filter and a target noise estimation value interval corresponding to the scene category in S202, to obtain a denoised video.

Optionally, the bilateral filter can achieve smooth denoising of an image and well preserve an image edge, because a kernel of the bilateral filter is generated by two functions: a spatial domain kernel (also known as a domain kernel, spatial coefficient or spatial domain) and a value domain kernel (pixel range domain). The bilateral filter also combines the spatial proximity and the pixel similarity of the image, and simultaneously considers the spatial domain information and the gray level similarity so as to achieve the purpose of retaining the edge denoising.

The spatial domain is a template weight determined by the Euclidean distance of the pixel position, BF is used as a symbol of bilateral filtering and is defined by a formula (1):

wherein, W_PIs a standard quantity, p and q are pixel points on the image, s represents the total pixel point, I_pAnd I_qRespectively the values of the pixels on the image, σ_dIs the standard deviation of the coordinate space, σ_rIs the standard deviation of the color space and,

is a function of the spatial proximity of the objects,

is a function of the degree of similarity of the gray levels,

and

is a function that measures the amount of noise filtering of the image. Wherein the content of the first and second substances,

and

expressed as formula (2) and formula (3), respectively

As can be seen from formulas (1), (2) and (3): bilateral filtering integrates the characteristics of Gaussian filtering and an alpha-truncated mean filter, and simultaneously considers the difference between a space domain and a value domain, and only considers the difference between the space domain and the value domain respectively. Wherein, the Gaussian filter only considers the Euclidean distance between pixels, and the used template coefficient is reduced along with the distance from the center of the window; the alpha-truncated mean filter only considers the difference value between the pixel gray values, and calculates the mean value after removing the minimum value and the maximum value of the pixel gray values.

Wherein d (p, q) and δ (i (p) i (q)) are respectively Euclidean distance between two pixel points of the image and gray level difference of the pixel, so that σ can be known_rAnd σ_dThe performance of bilateral filters is determined, which define the value of a pixel value by indicating the relative spatial, luminance variation range of the pixel's position (value). These two parameters will appear unsmooth as soon as one of them is close to 0, as long as σ is present_rIs less than its amplitude increase space, there will be no effect on the edges. Sigma_rRatio of variation of σ_dBut also the pixels of the image. With a_rGradually increasing, the bilateral filter gradually approaches Gaussian blur, because the series of Gaussian functions are relatively flat and are covered on the imageThe intensity interval of the cover is almost constant, and more features can be smoothed by increasing the spatial parameters. Thus, the flat region corresponds to gaussian filtering. The denoising effect of bilateral filtering depends on two parameters sigma_rAnd σ_d，σ_rAnd σ_dThe characteristics of a brightness domain and a space domain are controlled respectively, but based on the result of simulating the denoising effect, the sigma is proved_rValue ratio of (a)_dThe value of (d) is more important in changing the noise level. Alternatively, when the standard deviation σ of the color space is_rWhen the value of (A) is increased, the corresponding gray level similarity function

Also increases in value of, and

is a function that measures the amount of noise filtering of the image, and therefore σ_rAn increase in value of (c) may increase the ability of the bilateral filter to filter noise.

Optionally, the image processing device may perform denoising processing on the input image by using the bilateral filter, or perform denoising processing on an image corresponding to a scene category to which the input image belongs by adjusting a standard variance of a color space in a spatial proximity function in the bilateral filter and determining the scene category to which the input image belongs, and finally obtain a denoised video. Wherein the standard deviation σ due to the color space_rThe value of (B) may correspond to different scene categories, and thus, the value range of the standard variance of the color space in the spatial proximity function of the bilateral filter may be set as the value range (a, B) of the standard variance of the color space corresponding to the target noise estimation value range; wherein A and B are each a constant.

For example, assume that the first scene is a diffusely illuminated scene, the second scene is a foreground illuminated scene, the third scene is a background illuminated scene, and the fourth scene is a non-illuminated scene. The diffuse reflection illumination scene may be a scene when the light source uniformly irradiates, the foreground illumination scene may be a scene when the light source irradiates in front of the live broadcast main body, the background illumination scene may be a scene when the light source irradiates behind the live broadcast main body, and the non-illumination scene may be a scene when the light source does not irradiate.

Also for example, reference may be made to fig. 7, where fig. 7 is a schematic illustration of a plurality of images provided by an embodiment of the present application, as shown in fig. 7, the noise estimate value of the image a is 0.7026, and is located in the target noise estimate value interval (0.48,0.78) corresponding to the third scene, the noise estimate value of the image b is 0.6721, and is located in the target noise estimate value interval (0.48,0.78) corresponding to the third scene, the noise estimate value of the image c is 0.9128, and is located in the target noise estimate value interval (0.80,0.94) corresponding to the fourth scene obtained in fig. 6, the noise estimate value of the image d is 0.2125, and is located in the target noise estimate value interval (0.20,0.31) corresponding to the first scene, the noise estimate value of the image e is 0.3961, and is located in the target noise estimate value interval (0.32,0.49) corresponding to the second scene, the noise estimate of the image f is 0.4136, and is located in the target noise estimate interval (0.32,0.49) corresponding to the second scene. Therefore, the scene classification of the image a and the image b is the second scene, the scene classification of the image f and the image e is the second scene, the scene classification of the image d is the first scene, and the scene classification of the image c is the fourth scene.

In addition, as shown in fig. 7, image a and image b, the light source is behind the live main body, and belongs to a background lighting scene; the image c shows a scene without light source illumination, and belongs to a scene without illumination; the image d shows a scene when the light source uniformly irradiates, and belongs to a diffuse reflection illumination scene; images e and f show scenes when the light source irradiates in front of the live broadcast main body, and belong to foreground illumination scenes.

Optionally, when the scene types are the four scenes, the bilateral filter is used to perform denoising processing on the image corresponding to the scene type to which the input image belongs, and specifically, as can be seen from the description of the bilateral filter, the bilateral filter may be adjusted according to σ in the bilateral filter_rThe value of (d) is used to denoise an image corresponding to the scene class to which the input image belongs. Therefore, when the identified scene type is the fourth scene, it is known that the noise of the image corresponding to the fourth scene is the largest, and therefore σ can be set to be the largest_rIs adjusted within the interval (0.3, 0.6) to match the input mapDenoising an image corresponding to a fourth scene category to which the image belongs, and finally obtaining an input image corresponding to the fourth scene after denoising processing; when the identified scene type is the third scene, it is known that the noise of the image corresponding to the third scene is large, and therefore σ can be expressed by_rThe value of (2) is adjusted in an interval (0.2, 0.4) to denoise an image corresponding to a third scene category to which the input image belongs, and finally, the input image corresponding to the third scene after denoising processing is obtained; when the identified scene type is the second scene, it is known that the noise of the image corresponding to the second scene is small, and therefore, σ can be expressed as_rThe value of (2) is adjusted in an interval (0, 0.12) to denoise an image corresponding to a second scene category to which the input image belongs, and finally, an input image corresponding to the second scene after denoising processing is obtained; when the identified scene type is the first scene, it is known that the noise of the image corresponding to the first scene is the minimum, and therefore, σ can be expressed as_rThe value of (2) is adjusted in the interval (0.05, 0.2) to denoise the image corresponding to the first scene category to which the input image belongs, and finally the input image corresponding to the first scene after denoising processing is obtained. It should be noted that the above description of the numerical value of the range of the standard deviation of the color space is only an example, and other numerical values may be set according to specific implementation.

Referring to fig. 8, fig. 8 is a flowchart illustrating a denoising method according to an identified target scene category according to an embodiment of the present application, where the denoising method according to the identified scene category includes, but is not limited to, the following steps:

s801, the image processing equipment performs frame extraction processing on the video in the live scene to obtain an input image.

In this embodiment, the image processing apparatus may extract an input image from a video frame, including extracting an image from a live video (e.g., a live video of singing), or may also be a photo taken by using a shooting device, and the image processing apparatus uses the extracted image or the taken photo as an input image, where the extracting the input image from the video frame may be extracting the input image from video frames (e.g., 16 frames in succession), or extracting the input image from video frames at intervals (e.g., 8 frames in intervals), which is not limited in this application.

S802, the image processing equipment identifies the target scene category of the multi-frame input image by using the pre-trained neural network.

In the embodiment of the present application, the structure of the neural network may be composed of a plurality of convolutional layers (Conv), a plurality of Pooling layers (Pooling), and a plurality of fully connected layers (FC).

In an alternative embodiment, the scene category of the input image may be one of the following four scene categories, which may be respectively: the lighting system may include one of a first scene, a second scene, a third scene, and a fourth scene, where the first scene may be a diffuse reflection lighting scene, the second scene may be a foreground lighting scene, the third scene may be a background lighting scene, and the fourth scene may be a non-lighting scene.

Optionally, the image processing device obtains images corresponding to the four scene categories by performing frame extraction on the videos of the four scene categories.

In an optional implementation manner, the image processing device may import the obtained input images of four scene categories into the neural network, and perform feature extraction on the input images by using the neural network to obtain a feature map; determining the scene category to which the input image belongs and the probability of the scene category to which the input image belongs according to the feature map; and determining the scene category of the input image from the four scene categories according to the scene category to which the feature map belongs and the probability of the scene category to which the feature map belongs.

Optionally, assuming that the scene category to which the input image belongs is identified as the first scene category according to the feature map of the input image, and the probability of the first scene category to which the input image belongs meets a preset probability threshold, it is determined that the scene category of the input image is the first scene category, and then, in S803, the input image is correspondingly denoised according to the target noise estimation value interval corresponding to the scene category, so as to obtain a denoised video, where the preset probability threshold may be, but is not limited to, 0.88.

And S803, performing corresponding denoising processing on the input image according to the target noise estimation value interval corresponding to the scene type to obtain a denoised video.

In the embodiment of the present application, the classification and denoising of the image may refer to adopting different processes of reducing noise in the digital image for input images corresponding to different scene categories, and the process of reducing noise in the digital image may be based on some denoising methods, for example, a denoising method based on a bilateral filter, a denoising method based on gaussian filtering, and the like. However, the image denoising method based on the gaussian filter can cause the pixel point of the image obtained after denoising to shift, the image denoising method based on the median filter is not ideal for processing the gaussian noise, and the denoising method based on the neural network can have noise which does not exist in the original image after denoising the image. However, the bilateral filter can achieve smooth denoising of an image and well preserve image edges, because the kernel of the bilateral filter is generated by two functions: a spatial domain kernel (also known as a domain kernel, spatial coefficient or spatial domain) and a value domain kernel (pixel range domain). The bilateral filter also combines the spatial proximity and the pixel similarity of the image, and considers the spatial domain information and the gray level similarity at the same time so as to achieve the purpose of retaining the edge denoising.

Optionally, when the neural network identifies that the scene to which the input image belongs is a first scene, namely a diffuse reflection illumination scene, the bilateral filter is adopted to denoise the image corresponding to the determined diffuse reflection illumination scene, specifically, the denoising effect due to the bilateral filter depends on the standard variance σ of the color space_rThus, the standard deviation σ of the color space in the bilateral filter can be adjusted_rThe value of (a) is to perform denoising processing on the image corresponding to the diffuse reflection illumination scene to which the input image belongs to obtain the denoised input image, as shown in fig. 9, as can be seen from fig. 9, the denoised image b is smoother than the image a before denoising, and especially the right face part of the nose is shownThe obtained product is smoother;

when the scene to which the input image belongs is identified as a second scene through the neural network, namely, a foreground illumination scene, the second scene is obtained by adjusting sigma_rThe value of (a) is to perform denoising processing on the image corresponding to the foreground illumination scene to which the input image belongs to obtain a denoised input image, as shown in fig. 10, as can be seen from fig. 10, the denoised image b is smoother than the image a before denoising, and particularly, the right face part of the nose is smoother;

when the neural network identifies that the scene to which the input image belongs is a third scene, namely a background illumination scene, adjusting sigma_rThe value of (a) is to perform denoising processing on the image corresponding to the background illumination scene to which the input image belongs to obtain a denoised input image, as shown in fig. 11, as can be seen from fig. 11, the denoised image b is smoother than the image a before denoising, and particularly, the right face part of the nose is smoother;

when the neural network recognizes that the scene to which the input image belongs is a fourth scene, namely, a scene without illumination, by adjusting sigma_rThe value of (a) is to perform denoising processing on the image corresponding to the non-illumination scene to which the input image belongs to obtain the input image after denoising processing, as shown in fig. 12, as can be seen from fig. 12, the image b after denoising is smoother than the image a before denoising, and especially the right face part of the nose is smoother.

The problem of poor denoising effect caused by the fact that uniform denoising strength is adopted for different live broadcast scenes can be solved by denoising the images corresponding to the determined scene categories to different degrees.

It should be noted that, several specific embodiments shown in the above steps S801 to S803 of the present application may be implemented alone, or may be implemented in combination with any one or more of the above specific embodiments, and the present application is not limited thereto. Further, the present application is based on the specific implementation of the image denoising method described in fig. 2 and the specific implementation of the other image denoising method described in fig. 8, which may be implemented separately or in combination, and the present application is not limited thereto.

Based on the above method embodiment, the embodiment of the present application further provides a schematic structural diagram of an image processing apparatus. Fig. 13 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present disclosure. The image processing apparatus 1000 shown in fig. 13 can operate the following units:

the identification unit 1301 is used for identifying the target scene category of a plurality of frames of input images by using a pre-trained neural network, wherein the plurality of frames of input images are obtained by frame extraction of videos in a live broadcast scene; the neural network is obtained by training with a training sample set, wherein the training sample set comprises at least one group of training samples, and each group of training samples comprises labeling information of scene categories;

a processing unit 1302, configured to obtain a target noise estimation value interval corresponding to the target scene type;

and the denoising unit 1303 is configured to perform denoising processing on the video in the live broadcast scene according to the target noise estimation value interval, so as to obtain a denoised video.

In an optional implementation manner, the identifying unit 1301 is configured to identify a target scene category of multiple frames of input images by using a neural network trained in advance, where the multiple frames of input images are obtained by performing frame extraction on a video in a live-action scene; the neural network is obtained by training with a training sample set, wherein the training sample set comprises at least one group of training samples, and each group of training samples comprises labeling information of scene categories.

In an optional implementation manner, the processing unit 1302 is configured to obtain a target noise estimation value interval corresponding to the target scene type.

In an optional implementation manner, the denoising unit 1303 is configured to perform denoising processing on the video in the live broadcast scene according to the target noise estimation value interval, so as to obtain a denoised video.

In an optional implementation manner, the identifying unit 1301, when recognizing the target scene category of the input image by using a neural network trained in advance, is specifically configured to: carrying out feature extraction on an input image by using a neural network trained in advance to obtain a feature map; determining the scene category to which the input image belongs and the probability of the scene category to which the input image belongs according to the feature map; and determining the target scene category of the input image according to the scene category to which the feature map belongs and the probability of the scene category to which the feature map belongs.

In an optional implementation manner, when obtaining the target noise estimation value interval corresponding to the target scene type, the processing unit 1302 is specifically configured to: carrying out noise estimation on the input image of each frame by using a noise estimation method to obtain a target noise estimation value of the input image of each frame; and obtaining a target noise estimation value interval corresponding to the target scene type according to the target noise estimation value of each frame of the input image.

In an optional implementation manner, when obtaining the target noise estimation value interval corresponding to the target scene type, the processing unit 1302 is specifically configured to: and determining the noise estimation value interval corresponding to the target scene type as a target noise estimation value interval based on the preset corresponding relation between the scene type and the noise estimation value interval.

In an optional implementation manner, when performing denoising processing on the video in the live broadcast scene according to the target noise estimation value interval to obtain a denoised video, the denoising unit 1303 is specifically configured to: setting the standard variance of a color space in a spatial proximity function of a bilateral filter based on the target noise estimation value interval; and denoising the video in the live broadcast scene by using the set bilateral filter to obtain a denoised video.

In an optional implementation manner, when the scene type used for identification is one of the first scene, the second scene, the third scene, and the fourth scene, the noise estimation value in the target noise estimation value interval corresponding to the fourth scene is greater than the noise estimation value in the target noise estimation value interval corresponding to the third scene; the noise estimation value in the target noise estimation value interval corresponding to the third scene is larger than the noise estimation value in the target noise estimation value interval corresponding to the second scene; and the noise estimation value in the target noise estimation value interval corresponding to the second scene is larger than the noise estimation value in the target noise estimation value interval corresponding to the first scene.

In an alternative embodiment, when the scene type used for identifying the input image is the first scene, the identifying unit 1301 determines that the range (a, B) of the standard deviation is a range (0.05, 0.2); when the target scene category is the second scene, the value intervals (A, B) of the standard deviation are intervals (0, 0.12); when the target scene type is the third scene, the value intervals (A, B) of the standard deviation are intervals (0.2, 0.4); when the target scene type is the fourth scene, the value intervals (a, B) of the standard deviation are intervals (0.3, 0.6).

According to an embodiment of the present application, the steps involved in the image denoising method shown in fig. 2 may be performed by the units in the image processing apparatus shown in fig. 13. For example, step S201 in the image denoising method shown in fig. 2 may be performed by the recognition unit 1301 in the image denoising device shown in fig. 13, step S202 may be performed by the processing unit 1302 in the image denoising device shown in fig. 13, and step S203 may be performed by the denoising unit 1303 in the image denoising device shown in fig. 13.

According to the embodiment of the present application, the units in the image processing apparatus shown in fig. 13 may be respectively or entirely combined into one or several other units to form the image processing apparatus, or some unit(s) may be further split into multiple functionally smaller units to form the image processing apparatus, which may achieve the same operation without affecting the achievement of the technical effects of the embodiment of the present application. The units are divided based on logic functions, and in practical application, the functions of one unit can be realized by a plurality of units, or the functions of a plurality of units can be realized by one unit. In other embodiments of the present application, the image-based denoising apparatus may also include other units, and in practical applications, these functions may also be implemented by assistance of other units, and may be implemented by cooperation of multiple units.

According to an embodiment of the present application, the image denoising apparatus as shown in fig. 13 may be constructed by running a computer program (including program codes) capable of executing the steps involved in the corresponding method as shown in fig. 2 on a general-purpose computing device such as a computer including a Central Processing Unit (CPU), a random access storage medium (RAM), a read only storage medium (ROM), and the like as well as a storage element, and the image denoising method according to an embodiment of the present application may be implemented. The computer program may be embodied on a computer-readable storage medium, for example, and loaded into and executed by the above-described computing apparatus via the computer-readable storage medium.

In the embodiment of the application, the recognition unit 1301 recognizes the scene category of the input image, the processing unit 1302 performs corresponding denoising processing on the input image according to the recognized scene category to obtain a denoised video, and the denoising processing mode can perform denoising processing to different degrees according to different live scene categories, so that the problem of poor denoising effect caused by uniform denoising strength applied to different scenes is avoided.

Based on the method and device embodiments, the embodiment of the application provides an image processing device. Referring to fig. 14, a schematic structural diagram of an image processing apparatus according to an embodiment of the present application is provided. The image denoising apparatus 1400 shown in fig. 14 includes at least a processor 1401, an input interface 1402, an output interface 1403, a computer storage medium 1404, and a memory 1405. The processor 1401, the input interface 1402, the output interface 1403, the computer storage medium 1404, and the memory 1405 may be connected by a bus or other means.

A computer storage medium 1404 may be stored in the memory 1405 of the image denoising apparatus 1400, the computer storage medium 1404 storing a computer program, the computer program comprising program instructions, the processor 1401 for executing the program instructions stored by the computer storage medium 1404. The processor 1401 (or CPU) is a computing core and a control core of the image Processing apparatus 1400, and is adapted to implement one or more instructions, and in particular, to load and execute one or more computer instructions to implement corresponding method flows or corresponding functions.

An embodiment of the present application also provides a computer storage medium (Memory), which is a Memory device in the image processing apparatus 1400 and is used to store programs and data. It is understood that the computer storage medium herein may include a built-in storage medium in the image processing apparatus 1400, and may also include an extended storage medium supported by the image processing apparatus 1400. The computer storage medium provides a storage space that stores an operating system of the image processing apparatus 1400. Also stored in this memory space are one or more instructions, which may be one or more computer programs (including program code), suitable for loading and execution by processor 1401. The computer storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory; and optionally at least one computer storage medium located remotely from the processor.

In one embodiment, the computer storage medium may be loaded with one or more instructions and executed by processor 1401 to implement the corresponding steps described above with respect to the image processing method shown in FIG. 2. In particular implementations, one or more instructions in the computer storage medium are loaded and executed by processor 1401 to perform the steps of:

identifying the target scene category of a plurality of frames of input images by using a pre-trained neural network, wherein the plurality of frames of input images are obtained by extracting frames of a video in a live broadcast scene; the neural network is obtained by training with a training sample set, wherein the training sample set comprises at least one group of training samples, and each group of training samples comprises labeling information of scene categories;

obtaining a target noise estimation value interval corresponding to the target scene type;

and denoising the video in the live broadcast scene according to the target noise estimation value interval to obtain a denoised video.

In one possible implementation, the processor 1401 utilizes a pre-trained neural network to identify a target scene class of an input image, and includes: carrying out feature extraction on an input image by using a neural network trained in advance to obtain a feature map; determining the scene category to which the input image belongs and the probability of the scene category to which the input image belongs according to the feature map; and determining the target scene category of the input image according to the scene category to which the feature map belongs and the probability of the scene category to which the feature map belongs.

In one possible implementation, the obtaining, by the processor 1401, a target noise estimation value interval corresponding to the target scene class includes: carrying out noise estimation on the input image of each frame by using a noise estimation method to obtain a target noise estimation value of the input image of each frame; and obtaining a target noise estimation value interval corresponding to the target scene type according to the target noise estimation value of each frame of the input image.

In one possible implementation manner, the obtaining, by the processor 1401, a target noise estimation value interval corresponding to the target scene class includes: and determining the noise estimation value interval corresponding to the target scene type as a target noise estimation value interval based on the preset corresponding relation between the scene type and the noise estimation value interval.

In a possible implementation manner, the processor 1401 performs denoising processing on a video in the live broadcast scene according to the target noise estimation value interval to obtain a denoised video, including: setting the standard variance of a color space in a spatial proximity function of a bilateral filter based on the target noise estimation value interval; and denoising the video in the live broadcast scene by using the set bilateral filter to obtain a denoised video.

In one possible implementation, the processor 1401 is configured to determine a scene category to which the input image belongs, and includes: the target scene category of the input image is one of a first scene, a second scene, a third scene and a fourth scene, wherein a noise estimation value in a target noise estimation value interval corresponding to the fourth scene is larger than a noise estimation value in a target noise estimation value interval corresponding to the third scene; the noise estimation value in the target noise estimation value interval corresponding to the third scene is larger than the noise estimation value in the target noise estimation value interval corresponding to the second scene; and the noise estimation value in the target noise estimation value interval corresponding to the second scene is larger than the noise estimation value in the target noise estimation value interval corresponding to the first scene.

In a possible implementation manner, the processor 1401 is further configured to determine a scene class to which the belonging input image belongs, and when the target scene class is the first scene, a value interval (a, B) of the standard deviation is an interval (0.05, 0.2); when the target scene category is the second scene, the value intervals (A, B) of the standard deviation are intervals (0, 0.12); when the target scene type is the third scene, the value intervals (A, B) of the standard deviation are intervals (0.2, 0.4); when the target scene type is the fourth scene, the value intervals (a, B) of the standard deviation are intervals (0.3, 0.6).

In the implementation of the application, the processor 1401 obtains the scene category to which the input image belongs, and performs denoising processing on the image corresponding to the scene category to which the input image belongs, and the denoising processing mode can perform denoising to different degrees according to different scene categories, so that the problem of poor denoising effect caused by uniform denoising strength for different scenes is avoided.

According to an aspect of the present application, the present application embodiment also provides a computer product or a computer program, which includes computer instructions, which are stored in a computer-readable storage medium. The processor 1401 reads the computer instructions from the computer-readable storage medium, and the processor 1401 executes the computer instructions, so that the image denoising apparatus 1400 performs the image processing method shown in fig. 2.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the above-described modules is merely a logical division, and other divisions may be realized in practice, for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An image denoising method, comprising:

2. The method of claim 1, wherein the identifying the target scene class of the input image using the pre-trained neural network comprises:

carrying out feature extraction on an input image by using a neural network trained in advance to obtain a feature map;

determining the scene category to which the input image belongs and the probability of the scene category to which the input image belongs according to the feature map;

and determining the target scene category of the input image according to the scene category to which the feature map belongs and the probability of the scene category to which the feature map belongs.

3. The method according to claim 1, wherein the obtaining a target noise estimation value interval corresponding to the target scene type comprises:

carrying out noise estimation on the input image of each frame by using a noise estimation method to obtain a target noise estimation value of the input image of each frame;

and obtaining a target noise estimation value interval corresponding to the target scene type according to the target noise estimation value of each frame of the input image.

4. The method according to claim 1, wherein the obtaining a target noise estimation value interval corresponding to the target scene type comprises:

and determining the noise estimation value interval corresponding to the target scene type as a target noise estimation value interval based on the preset corresponding relation between the scene type and the noise estimation value interval.

5. The method according to claim 1, wherein the denoising the video in the live broadcast scene according to the target noise estimation value interval to obtain a denoised video comprises:

setting the standard variance of a color space in a spatial proximity function of a bilateral filter based on the target noise estimation value interval;

and denoising the video in the live broadcast scene by using the set bilateral filter to obtain a denoised video.

6. The method of claim 5,

the target scene category of the input image is one of a first scene, a second scene, a third scene, and a fourth scene, wherein,

the noise estimation value in the target noise estimation value interval corresponding to the fourth scene is larger than the noise estimation value in the target noise estimation value interval corresponding to the third scene; the noise estimation value in the target noise estimation value interval corresponding to the third scene is larger than the noise estimation value in the target noise estimation value interval corresponding to the second scene; and the noise estimation value in the target noise estimation value interval corresponding to the second scene is larger than the noise estimation value in the target noise estimation value interval corresponding to the first scene.

7. The method of claim 6,

when the target scene type is the first scene, the value intervals (A, B) of the standard deviation are intervals (0.05, 0.2);

when the target scene category is the second scene, the value intervals (A, B) of the standard deviation are intervals (0, 0.12);

when the target scene type is the third scene, the value intervals (A, B) of the standard deviation are intervals (0.2, 0.4);

when the target scene type is the fourth scene, the value intervals (a, B) of the standard deviation are intervals (0.3, 0.6).

8. An image processing apparatus characterized by comprising:

9. An image processing apparatus characterized by comprising:

memory, a processor, wherein the memory has stored thereon an image processing program which, when executed by the processor, implements the steps of the image denoising method according to any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that a computer program is stored thereon, which, when being executed by a processor, carries out the steps of the image denoising method according to any one of claims 1 to 7.