CN114926348A

CN114926348A - Device and method for removing low-illumination video noise

Info

Publication number: CN114926348A
Application number: CN202111583678.3A
Authority: CN
Inventors: 史国杰; 曹靖城; 吕超; 吴宇松
Original assignee: Tianyi Digital Life Technology Co Ltd
Current assignee: Tianyi Digital Life Technology Co Ltd
Priority date: 2021-12-22
Filing date: 2021-12-22
Publication date: 2022-08-19
Anticipated expiration: 2041-12-22
Also published as: CN114926348B

Abstract

The invention provides a device and a method for removing low-illumination video noise based on width learning and generation countermeasure network technology, wherein the method comprises the following steps: acquiring a low-illumination image to be denoised; inputting the obtained low-illumination picture as input data into a trained width learning network for spatial feature extraction to obtain a corresponding feature vector, wherein the width learning network is obtained by determining a feature node mapped by the input data and a pseudo-inverse of an enhancement node to a given target value; and inputting the obtained feature vectors into a generation network in a trained generation countermeasure network to generate a denoised picture, wherein the generation countermeasure network is obtained by alternately training the generation network and a discrimination network.

Description

Device and method for removing low-illumination video noise

Technical Field

The invention relates to the field of computer vision, in particular to a device and a method for removing low-illumination video noise based on a width learning and generation countermeasure network technology.

Background

With the development of 5G and video technologies, the household products based on the camera are developed rapidly, and the storage scale of video files is larger and larger. Taking a product of China telecom wing house as an example, a 25P storage space is needed for a video file newly added every day, the mass storage most directly brings high hardware purchasing cost, and in addition, the mass storage brings high management cost for capacity expansion, operation and maintenance, disaster recovery and the like. It is a technical challenge to compress video files as much as possible.

Especially in low-light environment, the exposure time of the photosensitive element is increased, the temperature is raised, and a large amount of white noise and dark current are caused by long-time photosensitive, and the white noise and the dark current cause output of a large amount of random noise. These noises can cause three problems to the production operations:

1. the imaging quality is low, the user experience is poor, and the video can be covered by one or more of Gaussian noise, salt and pepper noise, Rayleigh noise, exponential noise and the like;

2. noise can cause that high-frequency signals in a video are dense, and if the high-frequency signals are directly input to an encoder without processing for encoding, the storage space can be increased. For example, under the H265 standard, the code stream of the one-way video camera is compressed to about 700K under the condition of normal brightness, but the storage capacity is increased by 30% -50% under the low-illumination environment such as rainy night and cloudy days; and

3. when the images are used for AI recognition (such as face recognition and license plate recognition), the success rate is low, which brings great hidden danger to intelligent security.

The traditional coding and decoding technology uses low-pass filtering and median filtering to process noise. But low pass filtering can blur the picture while removing noise. The median filtering adopts a nonlinear method, and selects proper points to replace the values of the pollution points while protecting sharp edges of the image, so that the processing effect is not blurred like low-pass filtering, but the median filtering algorithm determines that the performance on salt-pepper noise is better, and the performance on other noises such as Gaussian noise is poorer. In recent years, with the development of artificial intelligence, machine learning algorithms and deep learning algorithms are applied to the field of image denoising, and obvious results are obtained, but training of an artificial intelligence model depends on labeling and learning of a large number of sample sets, and in an inference stage, the training depends on the computing power of hardware, so that the hardware cost is increased when ISP processing and AI processing are performed on the end side.

At present, a method of combining median filtering and nonlinear mapping is used for suppressing noise of an image, so that the visibility of the original image is maintained, the noise of a low-illumination image is suppressed, and the quality of a monitoring picture of a night image is improved. In addition, a method for enhancing an image by using a high-quality similar image in a similar scene is also provided at present, but the method directly uses a histogram matching mode to match, so that the accuracy is low, and the histogram is greatly influenced by illumination. Compared with the existing method, the deep learning-based method has better effect, but the parameters of weight, bias and the like which need training in the deep learning process are more than million, the model training speed is slow, the time is long, and the calculation cost is too high. Therefore, in order to improve the performance of image denoising and reduce the dependency on computational power, it is desirable to provide an improved method for removing low-illumination image noise.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

Aiming at the defects in the prior art, the method is combined with the practical application condition of the camera service, is different from a blind denoising scheme of mainstream in the industry, introduces a video frame with good illumination in the daytime as a reference frame, and performs purposefully quantifiable denoising processing on a low-illumination image by using a width learning and generation countermeasure network technology, for example, the method can be used for ISP processing before H264/H265 coding is performed on a terminal, and denoising an I frame and a P frame, so that the storage cost can be saved, the user experience can be improved, the method can also be used for a preprocessing module of deep learning image recognition, and the accuracy of image recognition can be improved.

According to an aspect of the present invention, there is provided a method for removing low-illuminance picture noise, the method including: acquiring a low-illumination image to be denoised; inputting the obtained low-illumination picture as input data into a trained width learning network for spatial feature extraction to obtain a corresponding feature vector, wherein the width learning network is obtained by determining a feature node mapped by the input data and a pseudo-inverse of an enhancement node to a given target value; and inputting the obtained feature vectors into a generation network in a trained generation countermeasure network to generate a denoised picture, wherein the generation countermeasure network is obtained by alternately training the generation network and a discrimination network.

According to an embodiment of the invention, the method further comprises: performing feature extraction on the obtained low-illumination picture to obtain a corresponding low-illumination feature vector; and when the cosine similarity between the low-illumination feature vector and one or more high-illumination reference feature vectors in a high-illumination reference feature library is greater than a predetermined threshold, returning a high-illumination reference picture corresponding to a high-illumination reference feature vector with the highest cosine similarity to the low-illumination feature vector from among the one or more high-illumination reference feature vectors as input data to be input into the width learning network.

According to a further embodiment of the present invention, the high illuminance reference feature library is constructed by: periodically acquiring a high-illumination reference picture, wherein the brightness of the high-illumination reference picture is higher than a threshold value; performing feature extraction on the high-illumination reference picture to obtain a corresponding high-illumination reference feature vector; comparing the high-illumination reference feature vector with the existing feature vectors in the high-illumination reference feature library; adding the high-illuminance reference feature vector to the high-illuminance reference feature library when the cosine similarity between the high-illuminance reference feature vector and each of the existing feature vectors in the high-illuminance reference feature library is less than a predetermined threshold; performing gray scale processing on the high-illuminance reference picture corresponding to the high-illuminance reference feature vector; and performing Laplace transform on the gray-scale processed pictures and storing the pictures in a serialized mode.

According to a further embodiment of the present invention, the training to generate the confrontation network is achieved by repeating the following steps, wherein a clear picture without noise and a corresponding low-light picture are taken as the training data set: inputting the low-illumination pictures in the training data set into a generation network in the generation countermeasure network through the feature vectors extracted by the width learning network to generate a denoised picture; inputting the generated de-noised picture and the corresponding clear picture without noise into a discrimination network in the generation countermeasure network so as to be used for the discrimination network to discriminate a real picture; optimizing the generating network and the discriminating network based on a loss calculation.

According to another aspect of the present invention, there is provided a system for removing low-illuminance picture noise, the system including: a feature extraction module configured to: acquiring a low-illumination image to be denoised; performing feature extraction on the obtained low-illumination picture to obtain a corresponding low-illumination feature vector; a similar picture comparison module configured to: comparing the low-illumination feature vector with one or more high-illumination reference feature vectors in a high-illumination reference feature library one by one; and when the cosine similarity between the low-illuminance feature vector and the one or more high-illuminance reference feature vectors is greater than a predetermined threshold, determining that a similar picture exists and returning a high-illuminance reference picture corresponding to a high-illuminance reference feature vector of the one or more high-illuminance reference feature vectors having the highest cosine similarity to the low-illuminance feature vector as input data; otherwise, determining that no similar picture exists and directly taking the acquired low-illumination picture as input data; a width learning feature extraction module configured to perform spatial feature extraction on the input data input to a trained width learning network to obtain corresponding feature vectors, wherein the width learning network is trained by determining feature nodes and pseudo-inverses of enhancement nodes to which the input data are mapped to a given target value; and a generative confrontation network denoising module configured to input the obtained feature vectors into a generative network in a trained generative confrontation network to generate a denoised picture, wherein the generative confrontation network is obtained by alternately training the generative network and a discriminant network.

According to an embodiment of the present invention, the high illuminance reference feature library is constructed by: periodically acquiring a high-illumination reference picture, wherein the brightness of the high-illumination reference picture is higher than a threshold value; performing feature extraction on the high-illumination reference picture to obtain a corresponding high-illumination reference feature vector; comparing the high-illumination reference feature vector with the existing feature vectors in the high-illumination reference feature library; adding the high-illuminance reference feature vector to the high-illuminance reference feature library when the cosine similarity between the high-illuminance reference feature vector and each of the existing feature vectors in the high-illuminance reference feature library is less than a predetermined threshold; performing gray scale processing on the high-illuminance reference picture corresponding to the high-illuminance reference feature vector; and performing Laplace transform on the gray-scale processed pictures and storing the pictures in a serialized mode.

According to a further embodiment of the invention, the feature extraction of the low-illumination picture and the high-illumination reference picture is performed by using a Scale Invariant Feature Transform (SIFT) operator.

According to still another aspect of the present invention, there is provided a system for removing low-illuminance picture noise, the system comprising: a memory storing a trained width learning network and a generation countermeasure network and computer executable instructions; and at least one processor that when executed cause the at least one processor to: acquiring a low-illumination image to be denoised; inputting the obtained low-illumination picture as input data into the width learning network for spatial feature extraction to obtain a corresponding feature vector, wherein the width learning network is obtained by determining a feature node mapped by the input data and a pseudo-inverse of an enhancement node to a given target value for training; and inputting the obtained feature vectors into a generation network in the generation countermeasure network to generate a denoised picture, wherein the generation countermeasure network is obtained by alternately training the generation network and a discriminant network.

According to one embodiment of the invention, the memory further stores a library of high-illuminance reference features, and the computer-executable instructions, when executed, cause the at least one processor to further perform the following: performing feature extraction on the obtained low-illumination picture to obtain a corresponding low-illumination feature vector; and when the cosine similarity between the low-illuminance feature vector and one or more high-illuminance reference feature vectors in the high-illuminance reference feature library is greater than a predetermined threshold, returning a high-illuminance reference picture corresponding to a high-illuminance reference feature vector with the highest cosine similarity of the low-illuminance feature vector among the one or more high-illuminance reference feature vectors as input data to be input to the width learning network.

Compared with the scheme in the prior art, the method and the system for removing the low-illumination picture noise provided by the invention have the following advantages:

(1) the denoising effect is good: a cleaner picture can be obtained after the width learning and the generation of the confrontation network are introduced, and the fuzzy problem caused by low-pass filtering and denoising can be reduced;

(2) the calculation force requirement is low: compared with a large amount of nonlinear calculation required by deep learning, the method is based on a width learning method, most of the calculation is linear calculation, the requirement on calculation power is low, and the method can be used on low-calculation-power equipment such as a camera, a visual doorbell, a mobile phone and the like; and

(3) the use is flexible: the denoising method can be realized not only as a preprocessing flow of a neural network, but also as an ISP processing flow of an H264/H265 encoder.

These and other features and advantages will become apparent from a reading of the following detailed description and a review of the associated drawings. It is to be understood that both the foregoing general description and the following detailed description are explanatory only and are not restrictive of aspects as claimed.

Drawings

So that the manner in which the above recited features of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only some typical aspects of this invention and are therefore not to be considered limiting of its scope, for the description may admit to other equally effective aspects.

FIG. 1 illustrates an exemplary architecture diagram of a system for removing low-light picture noise according to one embodiment of the present invention;

FIG. 2 illustrates an example flow diagram of a method for building a high-illuminance reference feature library according to one embodiment of this disclosure;

fig. 3 shows an example flow diagram of a method for removing low-illumination picture noise according to an embodiment of the invention;

FIG. 4 illustrates an exemplary block diagram of a breadth learning network, according to one embodiment of the invention;

FIG. 5 illustrates an exemplary block diagram of generating a countermeasure network in accordance with one embodiment of the invention;

FIG. 6 illustrates an exemplary block diagram of generating a discriminative network in a countermeasure network in accordance with one embodiment of the present invention; and

fig. 7 illustrates an example architecture diagram of a system for removing low-light picture noise according to one embodiment of this disclosure.

Detailed Description

The present invention will be described in detail below with reference to the attached drawings, and the features of the present invention will be further apparent from the following detailed description.

Fig. 1 illustrates an example architecture diagram of a system 100 for removing low-light picture noise according to one embodiment of this disclosure. As shown in fig. 1, the system 100 of the present invention comprises at least: the system comprises a feature extraction module 101, a similar picture comparison module 102, a width learning feature extraction module 103 and a generation confrontation network denoising module 104.

The feature extraction module 101 may be configured to acquire a low-illumination picture to be denoised, and perform feature extraction on the acquired low-illumination picture to obtain a corresponding low-illumination feature vector. In some cases, a SIFT operator may be used to perform feature extraction on the obtained low-illumination image to obtain a corresponding 128-dimensional feature vector, where the SIFT operator is a very stable local feature extraction operator, and may implement invariance preservation of rotation, scale scaling, brightness variation, and the like.

The similar picture comparing module 102 may be configured to compare the low-illumination feature vector obtained via the feature extraction module 101 with each high-illumination reference feature vector in the high-illumination reference feature library one by one, and when a cosine similarity between the low-illumination feature vector and one or more high-illumination reference feature vectors is greater than a predetermined threshold (e.g., 0.85), determine that a similar picture exists and directly return a high-illumination reference picture corresponding to a high-illumination reference feature vector of the one or more high-illumination reference feature vectors having a largest cosine similarity with the low-illumination feature vector for further image processing, otherwise, determine that no similar picture exists. The construction of the high-illuminance reference feature library will be described in further detail below.

The width learning feature extraction module 103 may be configured to extract spatial features of the acquired low-illuminance pictures to obtain corresponding feature vectors using a trained width learning network, wherein the width learning network is trained by determining feature nodes to which the input data is mapped and a pseudo-inverse of the enhancement nodes to a given target value, wherein a training process of the width learning network will be described in further detail below.

The generative confrontation network denoising module 104 may be configured to input the features extracted via the width learning network into a generative network in the trained generative confrontation network to generate a denoised picture, wherein the generative confrontation network is a network with denoising function obtained by alternately training the generative network and a discriminant network. Further, data close to the original picture denoised via the generation network may be input to the H264/H265 encoder for video encoding.

Those skilled in the art will appreciate that the system of the present invention and its various modules may be implemented in either hardware or software, and that the modules may be combined or combined in any suitable manner.

Fig. 2 illustrates an example flow diagram of a method 200 for building a high-illuminance reference signature library according to one embodiment of this disclosure. The method 200 starts in step 201 by periodically capturing a high-luminance reference picture and discarding the picture when the luminance of the picture is below a certain threshold, and entering the next picture processing procedure when the luminance of the picture is above the threshold. As an example, pictures of the camera may be taken every hour from 9:00 a.m. to 15:00 a.m., discarded if the brightness is below a certain threshold, and taken to the next step if the brightness is above the threshold.

In step 202, feature extraction is performed on the acquired high-illumination reference picture to obtain a corresponding high-illumination reference feature vector. In a preferred embodiment, SIFT operator can be adopted to perform feature extraction on the acquired high illumination reference picture to obtain a 128-dimensional characteristic feature vector.

In step 203, the obtained high-illuminance reference feature vector is compared with the existing feature vectors in the high-illuminance reference feature library, and when the cosine similarity between the high-illuminance reference feature vector and one of the existing feature vectors in the library is greater than a predetermined threshold (for example, 0.9), indicating that a similar reference exists, the high-illuminance reference feature vector is discarded, otherwise, the high-illuminance reference feature vector is stored in the high-illuminance reference feature library and the next step is performed.

In step 204, a gray-scale process is performed on the high-illuminance reference picture corresponding to the high-illuminance reference feature vector, wherein the gray-scale process is mainly used for eliminating the influence of color on the picture edge.

In step 205, the gray-processed picture is subjected to laplace transform and stored in a sequence, wherein the laplace transform is used for extracting the edge contour of the picture.

Thus, by the method 200 shown in fig. 2, a high-illumination reference feature library can be constructed and continuously updated, so that the image enhancement is performed by introducing the high-illumination reference feature library in the present invention.

Fig. 3 illustrates an example flow diagram of a method 300 for removing low-illumination picture noise according to one embodiment of this disclosure. The method 300 starts at step 301, and the feature extraction module 101 may acquire a low-illumination image to be denoised, and perform SIFT feature extraction on the acquired low-illumination image to obtain a corresponding low-illumination feature vector.

In step 302, the similar picture comparing module 102 may compare the SIFT features extracted in step 301 with SIFT features in a high-illumination reference feature library (e.g., the high-illumination reference feature library constructed by the method 200 described with reference to fig. 2) one by one.

In step 303, if the cosine similarity of the extracted SIFT feature and a SIFT feature in the high-illuminance reference feature library is greater than a predetermined threshold (e.g., 0.85), in step 304, it is determined that a similar picture exists and a grayscale-processed and laplacian-transformed high-illuminance reference picture corresponding to the SIFT feature is returned as input data. Preferably, if the cosine similarity between the plurality of SIFT features and the extracted SIFT features of the high-illuminance reference feature inventory is greater than a predetermined threshold, in step 304, it is determined that a similar picture exists and a grayscale-processed and laplace-transformed high-illuminance reference picture corresponding to a SIFT feature with the largest cosine similarity among the plurality of SIFT features is returned as input data.

Otherwise, at step 305, it is determined that there is no similar picture and the acquired low-illuminance picture is directly taken as input data.

In step 306, the width learning feature extraction module may input the input data into a width learning network for spatial feature extraction to obtain a corresponding feature vector. The core of the width learning is to calculate the pseudo-inverse of the feature nodes and the enhanced nodes to a target value, wherein the width learning network firstly utilizes the feature nodes to extract the features of the picture, then enhances the respective feature nodes through an enhanced mapping function to form corresponding enhanced nodes, the feature nodes and the enhanced nodes are used as the input layers of the width learning network, and the solved inverse matrix is equivalent to the weight of the neural network. Exemplary architecture of a breadth learning network 400 is shown in FIG. 4, wherein the method for training a breadth learning network comprises the following steps:

s1: and generating feature nodes and establishing the mapping from the input data to the feature nodes.

Set training set to H' _1(S×f) Where s represents the number of samples and f represents the number of features. First, z-score normalization is performed on the training set, normalizing the input data to between (0, 1). Subsequently, in order to add a bias term directly through matrix operation after the feature nodes are generated, the normalized data is subjected to an augmentation operation, a column is added at the end of the training set, and the training set is converted into H' _1(s×(f+1)) . Then starting to generate feature nodes for each window according to the following steps:

1) generating a dimension that obeys a Gaussian distribution of (f +1) xN ₁ Is given by a random weight matrix w _e In which N is ₁ Representing the number of characteristic nodes of each window;

2) will w _e Put in w _e { i }, i denotes the iteration quantity, and the iteration number is recorded as N ₂ ；

3) Let A ₁ ＝H ₁ ×w _e ；

4) To A ₁ Carrying out normalization;

5) to effectively reduce the linear correlation degree of the newly generated feature nodes and to obtain the sparse matrix W so that H ₁ ×W＝A ₁ To A, a ₁ Performing sparse representation, after sparse representation, A ₁ Is sx (f +1), where e.g. the lasso method is used to solve the optimization problem in the sparse representation. From this, it is possible to obtain:

6) generating a characteristic node for a window

T ₁ ：T ₁ ＝normal(H ₁ ×W)

Wherein, normal represents normalization, and the normalization method of each window characteristic node is marked as p _s (i) .1. the For N ₂ A feature window, generating N for each feature window ₁ And each node is an s-dimensional feature vector. Thus, for the entire network, the characteristic node matrix y is one dimension s × (N) ₂ ×N ₁ ) A matrix of (c).

S2: and generating the enhanced node.

Another characteristic of the breadth learning network is that random feature nodes can be supplemented with enhancement nodes. The characteristic nodes generated by the steps are all linear, and the purpose of introducing the enhancement nodes is to increase the nonlinear factors in the network.

1) Like the characteristic nodes, firstly, the characteristic node matrix y is standardized and augmented to obtain H ₂ . Different from the characteristic nodes, the coefficient matrix of the enhanced node is not a random matrix but a random matrix subjected to orthogonal normalization. Suppose (N) ₂ ×N ₁ )＞N ₃ Then the coefficient matrix w of the node is enhanced _h Watch capable of showingShown as (N) ₂ ×N ₁ )×N ₃ The dimensions are orthogonal normalized random matrices. The purpose of doing so is to map the feature nodes to a high-dimensional subspace through nonlinearity, so that the expression capability of the network is stronger to achieve the purpose of 'enhancement';

2) activating the enhanced node:

where s represents the scaling of the enhancement node, which is one of the adjustable parameters in the network. tansig is an activation function commonly used in a BP neural network, and can activate the features expressed by the enhanced nodes to the maximum extent;

3) input T of the final generation network ₃ : compared with the characteristic nodes, the enhancement nodes do not need to carry out sparse representation and window iteration. While iterations of orthogonalization may also take some computation time, the computation time required to add enhancement nodes is often less than that required to add feature nodes. Thus, the final input to the network can be represented as

The characteristic dimension of each sample is (N) ₁ ×N ₂ )+N ₃ 。

S3: the pseudo-inverse, i.e. the mapping of input to output in the width network is found.

Suppose Y _x Is the output value of the network, i.e. Y _x ＝T ₃ X W is then

Wherein Y is a label of the training set. Thus, the input and weight of the whole network are trained, and only W and p are parameters to be stored after training is finished _s Therefore, compared with deep learning, the network parameter quantity is small, and the required computing power is low.

Subsequently, in step 307, the generated confrontation network denoising module 104 may input the feature vectors extracted in step 306 into a generated confrontation network in the trained generated confrontation network to generate a denoised picture, wherein the generated confrontation network is obtained by alternately training the generated network and a discriminant network. An exemplary architecture diagram 500 for generating a countermeasure network is shown in fig. 5, wherein the specific steps for training the method for generating a countermeasure network are as follows:

step S1: and taking the clear picture without noise and the corresponding low-illumination picture as a training data set.

Step S2: the low-illumination pictures in the training dataset are input into the trained width learning network to extract the corresponding spatial features and input into the generation network to generate denoised pictures (false samples).

Step S3: and inputting the generated denoised picture (false sample) and a corresponding clear picture (true sample) without noise in the training data set into a discrimination network for discrimination of the true picture, wherein the label of the generated denoised picture is 0, and the label of the clear picture without noise is 1. The discrimination network is designed to discriminate whether the inputted picture is a real picture (a real sample) or a generated picture (a false sample), and the discrimination capability of the discrimination network can be improved through continuous iterative training. The decision network may have different structures, and in one example, the structure 600 of the decision network is shown in fig. 6, in which the downsampling operation is performed on the features through average pooling, the image features can be well preserved, and the texture information of high dimension can be extracted by performing convolution with step size of 2 through multiple 3 × 3 convolution kernels. In addition, in the discriminant network shown in fig. 6, two residual modules are added to the Resnet network for reference, so as to extract and fuse the features, and the residual modules can accelerate the extraction of the features and the convergence of training, and obtain better features.

In the example of fig. 6, the output of the residual network can be represented as:

x _l+1 ＝x _l +F(x _l ，W _l )

wherein x is _l Is an input of a residual structure, F is a residualConvolution branches of difference network, W _l Are the convolution kernel parameters.

The stack residual can be expressed as:

wherein the content of the first and second substances,

convolution branches, x, representing a plurality of residual structures _L Is the output of the stacked residual block.

And outputting an optimal residual error network structure according to the back propagation result:

step S4: loss calculation is carried out on the generated network and the judgment network, and parameters of the generated network and the judgment network are continuously optimized according to a calculation result, wherein pre-training is required to be carried out in order to ensure that the judgment network can correctly guide training of the generated network, and the pre-training adopts a mean square error as a loss function to carry out preliminary optimization. After the training is finished, the obtained generated network is a network with a denoising function.

Subsequently, the generated denoised picture can be further processed as input data of an H264/H265 encoder, thereby realizing reduction of storage capacity and improvement of user experience.

Fig. 7 illustrates an example architecture diagram of a system 700 for removing low-light picture noise according to one embodiment of this disclosure. The system 700 may be implemented on the mobile phone side or in the cloud side. As shown in fig. 7, system 700 may include a memory 701 and at least one processor 702. Memory 701 may store the trained width learning network and the generation countermeasure network. Memory 701 may include RAM, ROM, or a combination thereof. The memory 701 may store computer-executable instructions that, when executed by the at least one processor 702, cause the at least one processor to perform various functions described herein, including obtaining a low-illumination picture to be denoised; inputting the acquired low-illumination picture as input data into a trained width learning network for spatial feature extraction to obtain a corresponding feature vector; and inputting the obtained feature vector into a generation network in the trained generation countermeasure network to generate a denoised picture. In some cases, memory 701 may include, among other things, a BIOS that may control basic hardware or software operations, such as interaction with peripheral components or devices. The processor 702 may include intelligent hardware devices (e.g., general-purpose processors, DSPs, CPUs, microcontrollers, ASICs, FPGAs, programmable logic devices, discrete gate or transistor logic components, discrete hardware components, or any combinations thereof).

In a preferred embodiment, the memory 701 may further store a high-illuminance reference feature library, wherein the high-illuminance reference feature library may be constructed, for example, by the method as shown in fig. 2. The computer-executable instructions stored in the memory 701, when executed by the at least one processor 702, further cause the at least one processor to perform additional functions including feature extraction of the acquired low-illumination picture to obtain a corresponding low-illumination feature vector; and when the cosine similarity between the low-illumination feature vector and one or more high-illumination reference feature vectors in the high-illumination reference feature library is larger than a preset threshold value, returning a high-illumination reference picture corresponding to the high-illumination reference feature vector with the highest cosine similarity of the low-illumination feature vector in the one or more high-illumination reference feature vectors as input data to the width learning network.

The various illustrative blocks and modules described in connection with the disclosure herein may be implemented or performed with a general purpose processor, a DSP, an ASIC, an FPGA or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices (e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration).

The functions described herein may be implemented in hardware, software executed by a processor, firmware, or any combination thereof. If implemented in software executed by a processor, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Other examples and implementations are within the scope of the disclosure and the following claims. For example, due to the nature of software, the functions described herein may be implemented using software executed by a processor, hardware, firmware, hard-wired, or any combination thereof. Features that implement functions may also be physically located at various locations, including being distributed such that portions of functions are implemented at different physical locations.

What has been described above includes examples of aspects of the claimed subject matter. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the claimed subject matter, but one of ordinary skill in the art may recognize that many further combinations and permutations of the claimed subject matter are possible. Accordingly, the disclosed subject matter is intended to embrace all such alterations, modifications and variations that fall within the spirit and scope of the appended claims.

Claims

1. A method for denoising a low-illumination picture, the method comprising:

acquiring a low-illumination image to be denoised;

inputting the obtained low-illumination picture as input data into a trained width learning network for spatial feature extraction to obtain a corresponding feature vector, wherein the width learning network is obtained by determining a feature node mapped by the input data and a pseudo-inverse of an enhancement node to a given target value; and

inputting the obtained feature vectors into a generation network in a trained generation countermeasure network to generate a denoised picture, wherein the generation countermeasure network is obtained by alternately training the generation network and a discrimination network.

2. The method of claim 1, wherein the method further comprises:

performing feature extraction on the obtained low-illumination picture to obtain a corresponding low-illumination feature vector; and is provided with

When the cosine similarity between the low-illumination feature vector and one or more high-illumination reference feature vectors in a high-illumination reference feature library is larger than a preset threshold value, returning a high-illumination reference picture corresponding to the high-illumination reference feature vector with the highest cosine similarity of the low-illumination feature vector in the one or more high-illumination reference feature vectors as input data to be input into the width learning network.

3. The method of claim 2, wherein the high-illuminance reference feature library is constructed by:

periodically acquiring a high-illumination reference picture, wherein the brightness of the high-illumination reference picture is higher than a threshold value;

performing feature extraction on the high-illumination reference picture to obtain a corresponding high-illumination reference feature vector;

comparing the high-illumination reference feature vector with the existing feature vectors in the high-illumination reference feature library;

when the cosine similarity between the high-illuminance reference feature vector and each of the existing feature vectors in the high-illuminance reference feature library is less than a predetermined threshold,

adding the high-illumination reference feature vector into the high-illumination reference feature library;

performing gray scale processing on the high-illuminance reference picture corresponding to the high-illuminance reference feature vector;

and performing Laplace transform on the gray-scale processed pictures and storing the pictures in a serialized mode.

4. The method of claim 1, wherein the training to generate the countermeasure network is achieved by repeating the following steps, with a clear picture free of noise and a corresponding low-light picture as a training data set:

inputting the low-illumination pictures in the training data set into a generation network in the generation countermeasure network through the feature vectors extracted by the width learning network to generate a denoised picture;

inputting the generated de-noised picture and the corresponding clear picture without noise into a discrimination network in the generation countermeasure network so as to be used for the discrimination network to discriminate a real picture;

optimizing the generating network and the discriminating network based on a loss calculation.

5. A system for denoising low-light pictures, the system comprising:

a feature extraction module configured to:

acquiring a low-illumination image to be denoised;

performing feature extraction on the obtained low-illumination picture to obtain a corresponding low-illumination feature vector; a similar picture comparison module configured to:

comparing the low-illumination feature vector with one or more high-illumination reference feature vectors in a high-illumination reference feature library one by one; and is

When the cosine similarity between the low-illuminance feature vector and the one or more high-illuminance reference feature vectors is greater than a predetermined threshold, determining that a similar picture exists and returning a high-illuminance reference picture corresponding to a high-illuminance reference feature vector of the one or more high-illuminance reference feature vectors having the highest cosine similarity with the low-illuminance feature vector as input data;

otherwise, determining that no similar picture exists and directly taking the acquired low-illumination picture as input data;

a width learning feature extraction module configured to perform spatial feature extraction on the input data input to a trained width learning network to obtain corresponding feature vectors, wherein the width learning network is trained by determining feature nodes and pseudo-inverses of enhancement nodes to which the input data are mapped to a given target value; and

a generating countermeasure network denoising module configured to input the obtained feature vectors into a generating network in a trained generating countermeasure network to generate a denoised picture, wherein the generating countermeasure network is obtained by alternately training the generating network and a discriminating network.

6. The system of claim 5, wherein the high-illuminance reference feature library is constructed by:

when a cosine similarity between the high-illuminance reference feature vector and each of the feature vectors already in the high-illuminance reference feature library is less than a predetermined threshold,

7. The system of claim 6, wherein feature extraction for the low-luminance picture and the high-luminance reference picture is performed using a Scale Invariant Feature Transform (SIFT) operator.

8. The system of claim 5, wherein the training to generate the countermeasure network is achieved by repeating the following steps, with a clear picture without noise and a corresponding low-light picture as a training data set:

inputting low-illumination pictures in the training data set into a generation network in the generation countermeasure network through the feature vectors extracted by the width learning network to generate a denoised picture;

9. A system for denoising low-light pictures, the system comprising:

a memory storing a trained width learning network and a generation countermeasure network and computer executable instructions; and

at least one processor, the computer-executable instructions, when executed, cause the at least one processor to:

acquiring a low-illumination image to be denoised;

inputting the obtained low-illumination picture as input data into the width learning network for spatial feature extraction to obtain a corresponding feature vector, wherein the width learning network is obtained by training feature nodes and enhancement nodes mapped by the input data to a given target value through determining a pseudo-inverse of the feature nodes and the enhancement nodes; and

inputting the obtained feature vectors into a generation network in the generation countermeasure network to generate a denoised picture, wherein the generation countermeasure network is obtained by alternately training the generation network and a discriminant network.

10. The system of claim 9, wherein the memory further stores a library of high illuminance reference features, and the computer-executable instructions, when executed, cause the at least one processor to further:

performing feature extraction on the obtained low-illumination picture to obtain a corresponding low-illumination feature vector; and is

When the cosine similarity between the low-illumination feature vector and one or more high-illumination reference feature vectors in the high-illumination reference feature library is larger than a preset threshold value, returning a high-illumination reference picture corresponding to a high-illumination reference feature vector with the highest cosine similarity to the low-illumination feature vector in the one or more high-illumination reference feature vectors as input data to be input into the width learning network.