CN112150363B

CN112150363B - Convolutional neural network-based image night scene processing method, computing module for operating method and readable storage medium

Info

Publication number: CN112150363B
Application number: CN202011049682.7A
Authority: CN
Inventors: 冷聪; 李成华; 朱宇; 程健
Original assignee: Zhongke Nanjing Artificial Intelligence Innovation Research Institute; Zhongke Fangcun Zhiwei Nanjing Technology Co ltd
Current assignee: Zhongke Nanjing Artificial Intelligence Innovation Research Institute; Zhongke Fangcun Zhiwei Nanjing Technology Co ltd
Priority date: 2020-09-29
Filing date: 2020-09-29
Publication date: 2023-07-07
Anticipated expiration: 2040-09-29
Also published as: CN112150363A

Abstract

The invention provides an image night scene processing method based on a convolutional neural network, a computing module and a readable storage medium for running the method, wherein the method comprises the following steps: step 1, collecting a plurality of groups of RAW format data samples; step 2, designing a super night scene network model; step 3, training the super night scene network model in the step 2; and 4, outputting a result. According to the invention, a super night scene network model is built and trained, a basic data set is built according to requirements, the SNN network is trained according to the basic data set, and the performance test result is obtained. If the model prediction result is not ideal, the data set is expanded or reconstructed according to the scene requirement. The super night scene picture with excellent appearance can be obtained only by taking out RAW data from the CMOS of the camera, the problems of image shake and ghost caused by long-time exposure of the traditional night scene function are avoided, and the influence of the image shake and ghost when the image is synthesized by adopting an AI technology is further avoided.

Description

Convolutional neural network-based image night scene processing method, computing module for operating method and readable storage medium

Technical Field

The invention relates to an image night scene processing method based on a convolutional neural network, a computing module and a readable storage medium for running the method, which relate to G06T: the field of image data processing or generation in general, and in particular to G06T5/00: enhancement or restoration of images.

Background

Because the area of the mobile phone camera sensor is smaller, the light sensing capability is poor, and more natural light cannot be captured under the condition of insufficient light, more noise points, insufficient brightness and weak resolving power of the picture are caused. However, the handheld long-exposure super night scene function is enabled, the brightness of the photo is greatly improved, the bright and dark details are prominent, and the overexposure phenomenon can not occur in a highlight area even if the brightness of the picture is fluctuated.

At present, super night scene functions appear on more and more mobile phones. The principle is that a plurality of ISO photos with different exposure are shot through long exposure, and then synthesized.

However, the super night scene function is more greedy, the exposure time is often more than a few seconds, and therefore the requirements on mobile phone hardware and software algorithms are very high. In addition, the super night scene mode is difficult to pick up high quality pictures in long exposure because in long exposure, the human hand must have uncontrollable slight shake, and if no algorithm is used for finishing, the mobile phone can incorporate these shake pictures, resulting in problems in the picture.

Disclosure of Invention

The invention aims to: an object is to provide an image night scene processing method based on a convolutional neural network, so as to solve the above problems in the prior art. It is a further object to propose a computing module operable to carry out the above method and a storage medium readable by the computing module.

The technical scheme is as follows: an image night scene processing method based on a convolutional neural network comprises the following steps:

step 1, collecting a plurality of groups of RAW format data samples;

step 2, designing a super night scene network model;

step 3, training the super night scene network model in the step 2;

and 4, outputting a result.

In a further embodiment, the step 1 is further:

step 1-1, shooting different scenes through a plurality of preset image acquisition devices with different models to obtain a plurality of RAW format data samples, taking the RAW format data samples acquired by the image acquisition devices with different models in the same scene as a group of parent samples, dividing the parent samples into different child samples according to the model of the image acquisition device, and marking each sample;

step 1-2, after data sample acquisition is completed, aligning the images, and removing the non-overlapping parts of the images;

Step 1-3, after the image alignment operation of step 1-2, the data samples are further divided into a training set, a verification set and a test set.

In a further embodiment, the aligning the images in the step 1-2 further includes matching key points of the images, and iterating the random subset repeatedly on the basis of the key points;

wherein the keypoints of the matching images are further as follows:

step 1-2a, searching all image positions on a preset scale space, and extracting key points including corner points, edge points, bright points of dark areas and dark points of the bright areas through convolution operation; the calculation method of the scale space L (x, y, sigma) is as follows:

L(x，y，σ)＝G(x，y，σ)·C(x，y)

wherein C (x, y) represents the midpoint coordinates of the key points, G (x, y, sigma) represents a Gaussian kernel function, sigma is a scale space factor, and a constant value is taken;

wherein the gaussian kernel function is expressed as follows:

wherein each symbol is as defined above;

step 1-2b, collecting gradient modulus values of key points:

step 1-2c, collecting direction distribution of key points:

wherein each symbol is as defined above;

step 1-2d, calculating the domain point k of the key point k _i ：

In (x) _k ，y _k ) The direction of the key point is indicated, and the rest symbols are the same as the above.

In a further embodiment, the step 2 further includes:

Step 2-1, establishing an SNN super night scene network model, wherein the SNN super night scene network model comprises at least one Encoder network and at least one Decoder network, and each network comprises multiple layers; the method comprises the steps that the Encoder network has multiple downsampling, at least two 3x3 convolutions are arranged in each layer, an activation function and a Switchablenormalization layer are connected after each convolution, and finally a 2x2 max_mapping operation is added, namely, the step length is 2 for downsampling; the whole Encoder network is repeated three times;

each step in the Decoder network includes upsampling the feature map, concatenating the 3x3 convolutions halving the number of feature channels with the corresponding feature map from the Encoder network, and then convolving the concatenated feature map by two 3x3 convolutions, each convolution followed by an activation function and a switchblenormalization layer; using a 3*3 convolution layer in the last layer, and finally outputting the processed image through pixel_shuffle;

step 2-2, propose Residual Dense block, put it on skip-connection, the residual Denseblock is made up of 3 Denseblocks, there are 5 convolutions inside each Denseblock, connect an activation function and switching normalization layer after each convolution, each layer accepts the output characteristic pattern from all previous convolution layers at the same time.

In a further embodiment, step 3, the training sample set is divided into a low-resolution training set and a high-resolution image block in the process of training the super night scene network model;

the low-resolution training set is obtained by the following steps: firstly, downsampling the high-resolution image by N times to obtain different low-resolution images; expanding the obtained low-resolution images, overlapping and sampling each obtained low-resolution image to obtain a group of overlapped low-resolution image blocks, and taking the overlapped low-resolution image blocks as a low-resolution training set;

the high-resolution image block acquisition mode is as follows:

overlapping and sampling the high-resolution image corresponding to the N times of downsampling operation, and then taking the obtained group of corresponding overlapped high-resolution image blocks as a high-resolution tag image; n is a positive integer;

the expansion mode of expanding the obtained low-resolution image is to perform rotation transformation of 90 degrees, 180 degrees and 270 degrees so as to obtain low-resolution images with different angles;

a training convolutional network is then constructed:

firstly, taking a low-resolution image LR as input, extracting shallow features through a convolution layer, then learning deep features of the image through a plurality of stacked CACB modules, finally fusing the extracted shallow features and deep features, and up-sampling in a sub-pixel convolution mode to obtain a high-resolution image;

The CACB module consists of four fusion convolution layers, and one-fourth of the characteristics of each fusion convolution layer are reserved for final characteristic fusion; the structural details of the fusion convolutional layer related in the module are divided into a training stage and a deployment stage;

the loss function used in the training process is:

L _total ＝0.5*L ₁ +0.05*L _SSIM +0.1*L _VGG +L _adv

wherein L is ₁ Is the average absolute error, L _SSIM For structural similarity, L _VGG To perceive loss, L _adv Representing countermeasures against losses;

wherein F (-) is a feature map, G (I), of the 34 th layer output of the VGG19 network pre-trained on ImageNet _i，j，k C is a picture generated by a generator _i，j，k And D (-) is the output of the discriminator for the corresponding original picture with the foreground effect.

In a further embodiment, step 4 further includes acquiring an image by the image acquisition sensor and outputting a night scene image after the image is finally modified by the super night scene network model obtained by training in step 3; carrying out the scene rendering before outputting the night scene image, wherein the model with the scene rendering effect picture can be specifically constructed as follows:

wherein I is _bokeh Representing the finally obtained image, I _org The original image is represented by a representation of the original image,

representing the multiplication of the matrix element by element, B _i (. Cndot.) is the ith level of blurFunction, W _i Characteristic weight matrix values representing an i-th layer data image,/- >

Involves the ith level of blurring function B _i (. About.) is a superficial fuzzy neural network->

Obtained iteratively i times, expressed as:

the loss function/adopts the combination of a reconstruction function and the structural similarity SSIM, and an error value counter-propagation optimization model is adopted; wherein l ₁ The method comprises the following steps:

wherein I is _bokeh The representation model generates an image with a foreground effect,

original image representing an image actually with a foreground effect,/->

Representing the generated image I _bokeh And (3) the actual image>

The structural similarity is as follows:

wherein alpha, beta and gamma are preset constants,

representing the generated image I _bokeh And actual image

Brightness relation between->

Representing the generated image I _bokeh And (3) the actual image>

Contrast relation between->

Representing the generated image I _bokeh And (3) the actual image>

Structural relationship between the two.

An image night scene processing system based on a convolutional neural network comprises a first module for collecting a plurality of groups of RAW format data samples; the second module is used for establishing a super night scene network model; a third module for training the super night scene network model; and a fourth module for performing a scene rendering on the night scene image before output.

The first module further shoots different scenes through a plurality of preset image acquisition devices with different models to obtain a plurality of RAW format data samples, RAW format data samples acquired by the image acquisition devices with different models in the same scene are used as a group of mother samples, the mother samples are divided into different sub-samples according to the model of the image acquisition device, and each sample is marked;

After the data sample acquisition is completed, carrying out alignment operation on the images, and removing the non-overlapping part of the images; the image alignment operation comprises matching key points of the image, and repeatedly iterating for a plurality of random subsets on the basis;

wherein the keypoints of the matching images are further as follows:

searching all image positions on a preset scale space, and extracting key points including corner points, edge points, bright points of dark areas and dark points of the bright areas through convolution operation; the calculation method of the scale space L (x, y, sigma) is as follows:

L(x，y，σ)＝G(x，y，σ)·C(x，y)

wherein the gaussian kernel function is expressed as follows:

wherein each symbol is as defined above;

acquiring gradient modulus values of key points:

the direction distribution of the acquisition key points:

wherein each symbol is as defined above;

domain point k for calculating key point k _i ：

In (x) _k ，y _k ) The directions of the key points are represented, and the other symbols have the same meaning;

dividing the data sample into a training set, a verification set and a test set;

the second module is further configured to establish an SNN super night scene network model, including at least one Encoder network and at least one Decoder network, each network including multiple layers; the method comprises the steps that the Encoder network has multiple downsampling, at least two 3x3 convolutions are arranged in each layer, an activation function and a Switchablenormalization layer are connected after each convolution, and finally a 2x2 max_mapping operation is added, namely, the step length is 2 for downsampling; the whole Encoder network is repeated three times;

residual Dense block put it on skip-connection, the residual Denseblock is composed of 3 Denseblocks, there are 5 convolutions in each Denseblock, connect an activation function and a Switchablenormalization layer after each convolution, each layer accepts the output characteristic diagram from all convolution layers before at the same time;

the third module is further configured to divide the training sample set into a low resolution training set and a high resolution image block;

The high-resolution image block acquisition mode is as follows:

a training convolutional network is then constructed:

the loss function used in the training process is:

L _total ＝0.5*L ₁ +0.05*L _SSIM +0.1*L _VGG +L _adv

The fourth module is further configured to construct a model with a foreground rendering effect picture:

representing the multiplication of the matrix element by element, B _i (. Cndot.) is the i-th order blur function, W _i Characteristic weight matrix values representing an i-th layer data image,/->

Obtained iteratively i times, expressed as:

wherein I is _boke The representation model generates an image with a foreground effect,

original image representing an image actually with a foreground effect,/->

Representing the generated image I _bokeh And (3) the actual image>

Structural similarity betweenThe body is as follows:

wherein alpha, beta and gamma are preset constants,

Representing the generated image I _bokeh And actual image

Brightness relation between->

Representing the generated image I _bokeh And (3) the actual image>

Contrast relation between->

Representing the generated image I _bokeh And (3) the actual image>

Structural relationship between the two.

A computing module comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program stored on a computer readable storage medium, the computing module being configured to execute the computer program by the processor by running the readable storage medium, thereby performing the steps of:

and step 1, data acquisition. And shooting night scenes by using a plurality of low-end mobile phones, taking out RAW format data, and simultaneously shooting long-exposure night scene RGB pictures in a single reverse mode, wherein the number of the night scenes is about 10 ten thousand pairs. After the data set is collected, the SIFT key point matching algorithm and the RANSAC algorithm are used for alignment operation, and the non-overlapping part of the SIFT key point matching algorithm and the RANSAC algorithm is removed. After matching is completed, the data is divided into a training set, a verification set and a test set.

Step 1-2, after data sample acquisition is completed, aligning the images, and removing the non-overlapping parts of the images; the step 1-2 of aligning the images further comprises matching key points of the images, and iterating the random subsets repeatedly on the basis;

wherein the keypoints of the matching images are further as follows:

L(x,y,σ)＝G(x,y,σ)·C(x,y)

wherein the gaussian kernel function is expressed as follows:

wherein each symbol is as defined above;

step 1-2b, collecting gradient modulus values of key points:

step 1-2c, collecting direction distribution of key points:

wherein each symbol is as defined above;

step 1-2d, calculating the domain point k of the key point k _i ：

In (x) _k ,y _k ) The direction of the key point is indicated, and the rest symbols are the same as the above.

And 2, designing a model. Referring to fig. 1, and in detail to fig. 2, first, we propose Super Night network (SNN, super night scene network) whose main body is an Encoder-Decoder structure. It consists of an Encoder (left side) and a Decoder (right side). The Encoder is similar to a common classification network in that there are multiple downsampling, and the feature map changes from large to narrow to small to wide. In each layer there are two 3x3 convolutions, each followed by an activation function (leakyReLU) and a switchblenormalization layer, and finally a 2x2 max_mapping operation, i.e. a step size of 2, is added for downsampling. The number of characteristic channels will double in each downsampling step. The entire Encoder needs to repeat this step three times.

Each step in the Decoder involves upsampling the feature map, here using nearest interpolation, followed by a 3x3 convolution that halves the number of feature channels, concatenating the corresponding feature map (specially processed) from the Encoder, followed by two 3x3 convolutions, again each followed by an activation function leakyReLU and switchblenormalization layer. At the last layer, only one 3*3 convolution layer is used, and the processed image is output through pixel_shuffle.

Here, to obtain more information from RAW data, we use a skip-connection structure and put it on the skip-connection, and put it on Residual Dense block. The residual DenseBlock consists of 3 Denseblocks, each with 5 layers of convolutions inside, each layer of convolutions followed by an activation function (LeakyReLU) and a Switchablenormalization layer, while each layer accepts output profiles from all the preceding convolutions, an operation that is a concatenation. In addition, to obtain more efficient information, we add a ChannelAttention module after Densblock. It consists of an averagemapping layer and 2 3 x 3 convolution layers and a nonlinear transformation ReLU and Sigmod layer, and the connection mode is shown in fig. 3.

And 3, training a model. Based on the model and the data set, we realized rapid training of the model using distributed training, which only took 2.5 hours. Step 3, in the process of training the super night scene network model, the training sample set is divided into a low-resolution training set and a high-resolution image block;

The high-resolution image block acquisition mode is as follows:

a training convolutional network is then constructed:

the loss function used in the training process is:

L _total ＝0.5*L ₁ +0.05*L _SSIM +0.1*L _VGG +L _adv

wherein F (-) is a feature map, G (I), of the 34 th layer output of the VGG19 network pre-trained on ImageNet _i,j,k C is a picture generated by a generator _i,j,k And D (-) is the output of the discriminator for the corresponding original picture with the foreground effect.

Step 4, outputting a result, and before outputting the result, making the image pass through a preset foreground rendering model, wherein the model is constructed as follows:

representing the multiplication of the matrix element by element, B _i (. Cndot.) is the i-th order blur function,W _i characteristic weight matrix values representing an i-th layer data image,/->

Obtained iteratively i times, expressed as:

original image representing an image actually with a foreground effect,/->

Representing the generated image I _bok And (3) the actual image>

The structural similarity is as follows:

Wherein alpha, beta and gamma are preset constants,

representing the generated image I _bokeh And actual image

Brightness relation between->

Representing the generated image I _bokeh And (3) the actual image>

Contrast relation between->

Representing the generated image I _bokeh And (3) the actual image>

Structural relationship between the two.

A computer module readable storage medium having stored thereon a computer program which when executed by a processor performs the following process:

wherein the keypoints of the matching images are further as follows:

L(x,y,σ)＝G(x,y,σ)·C(x,y)

wherein the gaussian kernel function is expressed as follows:

wherein each symbol is as defined above;

step 1-2b, collecting gradient modulus values of key points:

step 1-2c, collecting direction distribution of key points:

wherein each symbol is as defined above;

step 1-2d, calculating the domain point k of the key point k _i ：

In (x) _k ,y _k ) Representing the position of the key point, the remaining symbolsThe numbers are as defined above.

The high-resolution image block acquisition mode is as follows:

a training convolutional network is then constructed:

the loss function used in the training process is:

L _total ＝0.5*L ₁ +0.05*L _SSIM +0.1*L _VGG +L _adv

And 4, outputting a result.

The beneficial effects are that: the invention relates to an image night scene processing method based on a convolutional neural network, and further relates to a computing module capable of running the method and a storage medium capable of being read by the computing module. By building the super night scene network model and training the model, the super night scene picture with excellent appearance can be obtained only by taking out RAW data from a camera CMOS, the problems of image shake and ghost caused by long-time exposure of the traditional night scene function are avoided, and the influence of the image shake and ghost when the image is synthesized by adopting an AI technology is further avoided.

Drawings

FIG. 1 is a flowchart of an algorithm of the present invention.

Fig. 2 is a schematic diagram of an SNN super night scene network model in the present invention.

Fig. 3 is a schematic structural diagram of a channel attention mechanism module in the SNN super night scene network model.

Fig. 4 is a comparison diagram of a general algorithm night scene image and an algorithm night scene image related to the invention.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a more thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the invention may be practiced without one or more of these details. In other instances, well-known features have not been described in detail in order to avoid obscuring the invention.

The applicant believes that the principle of the super night scene function at present is that a plurality of ISO photos with different exposures are shot through long exposure and then synthesized, but the super night scene function is more greedy, the exposure time is often more than a few seconds, so the requirements on mobile phone hardware and software algorithms are very high. In addition, the super night scene mode is difficult to pick up high quality pictures in long exposure because in long exposure, the human hand must have uncontrollable slight shake, and if no algorithm is used for finishing, the mobile phone can incorporate these shake pictures, resulting in problems in the picture. Part of mobile phone manufacturers adopt AI technology to remove the blurred photos, then automatically align scenes through system identification, and finally synthesize the scenes.

In order to solve the above problems, we propose to use convolutional neural network to realize super night scene mode, and for our algorithm, we can obtain super night scene picture with excellent look and feel only by taking RAW data from camera CMOS. The specific algorithm flow is as follows:

Wherein the keypoints of the matching images are further as follows:

L(x,y,σ)＝G(x,y,σ)·C(x,y)

wherein the gaussian kernel function is expressed as follows:

wherein each symbol is as defined above;

step 1-2b, collecting gradient modulus values of key points:

step 1-2c, collecting direction distribution of key points:

wherein each symbol is as defined above;

step 1-2d, calculating the domain point k of the key point k _i ：

/>

And 2, designing a model. Referring to fig. 1, and in detail, referring to fig. 2, first, we propose Super Night network (SNN, super night scene network) which is shown in fig. 2 and has an Encoder-Decoder structure as a main body. It consists of an Encoder (left side) and a Decoder (right side). The Encoder is similar to a common classification network in that there are multiple downsampling, and the feature map changes from large to narrow to small to wide. In each layer there are two 3x3 convolutions, each followed by an activation function (leakyReLU) and a switchblenormalization layer, and finally a 2x2 max_mapping operation, i.e. a step size of 2, is added for downsampling. The number of characteristic channels will double in each downsampling step. The entire Encoder needs to repeat this step three times.

Here, to obtain more information from RAW data, we use a skip-connection structure and put it on the skip-connection, and put it on Residual Dense block. The residual DenseBlock consists of 3 Denseblocks, each with 5 layers of convolutions inside, each layer of convolutions followed by an activation function (LeakyReLU) and a Switchablenormalization layer, while each layer accepts output profiles from all the preceding convolutions, an operation that is a concatenation. In addition, to obtain more efficient information, we add a ChannelAttention module after Densblock. It consists of an averagemapping layer and 2 3x3 convolution layers and a nonlinear transformation ReLU and Sigmod layer, and the connection mode is shown in fig. 3.

the high-resolution image block acquisition mode is as follows:

A training convolutional network is then constructed:

the loss function used in the training process is:

L _total ＝0.5*L ₁ +0.05*L _SSIM +0.1*L _VGG +L _adv

And 4, outputting a result.

Before outputting the result, the image passes through a preset foreground rendering model, and the model is constructed as follows:

Obtained iteratively i times, expressed as:

original image representing an image actually with a foreground effect,/->

Representing the generated image I _boke And (3) the actual image>

The structural similarity is as follows:

wherein alpha, beta and gamma are preset constants,

representing the generated image I _bokeh And actual image

Brightness relation between->

Representing the generated image I _bok And (3) the actual image>

Contrast relation between->

Representing the generated image I _bokeh And (3) the actual image>

Structural relationship between the two.

In fig. 4, the left side is a night scene picture taken by red rice 8, and the right side is a night scene picture obtained by our network (the data used by both are RAW data of red rice 8). It is obvious that the effect of the red rice is richer than the detail of the red rice, the color is soft, and the red rice is in line with the eye feeling of people.

In conclusion, the algorithm flow can effectively improve shooting requirements of the low-end mobile phone in night scenes, and meanwhile, the cost is lower. This is less expensive for the consumer purchasing the low-end handset, but can result in the super night vision technology of the high-end handset. In addition, the requirement of the super night scene algorithm on hardware is greatly reduced, and if the NPU chip of Airia is matched, the cost is further reduced, and the cost performance of the mobile phone is improved.

As described above, although the present invention has been shown and described with reference to certain preferred embodiments, it is not to be construed as limiting the invention itself. Various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. The image night scene processing method based on the convolutional neural network is characterized by comprising the following steps of:

step 1, collecting a plurality of groups of RAW format data samples;

step 2, designing a super night scene network model;

step 2-2, providing Residual Dense block, placing the steps on a skip-connection, wherein the residual Denseblock consists of 3 Denseblocks, 5 layers of convolutions are arranged in each Denseblock, an activation function and a switching normalization layer are connected behind each layer of convolutions, and each layer receives an output characteristic diagram from all the previous convolutions;

step 3, training the super night scene network model in the step 2;

and 4, outputting a result.

2. The method for processing an image night scene based on a convolutional neural network according to claim 1, wherein the step 1 is further:

3. The method for processing the night scene of the image based on the convolutional neural network according to claim 2, wherein the step 1-2 of aligning the image further comprises matching key points of the image, and iterating a plurality of random subsets repeatedly on the basis of the key points;

wherein the keypoints of the matching images are further as follows:

step 1-2a, searching all image positions on a preset scale space, and extracting key points including corner points, edge points, bright points of dark areas and dark points of the bright areas through convolution operation; in which the scale space

The calculation method of (2) is as follows:

；

in the method, in the process of the invention,

midpoint coordinates representing the key points +.>

Representing a gaussian kernel function +.>

Taking a constant value for the scale space factor;

wherein the gaussian kernel function is expressed as follows:

；

wherein each symbol is as defined above;

step 1-2b, collecting gradient modulus values of key points:

；

step 1-2c, collecting direction distribution of key points:

；

Wherein each symbol is as defined above;

step 1-2d, calculating key pointskIs to (1) the field point of

：

；

In the method, in the process of the invention,

the direction of the key point is indicated, and the rest symbols are the same as the above.

4. The method for processing the night scene of the image based on the convolutional neural network according to claim 1, wherein in the step 3, a training sample set is divided into a low-resolution training set and a high-resolution image block in the process of training the super night scene network model;

the high-resolution image block acquisition mode is as follows:

A training convolutional network is then constructed:

the loss function used in the training process is:

；

wherein,,

mean absolute error, +.>

For structural similarity, ++>

For perception loss->

Representing countermeasures against losses;

；

；

wherein,,

feature map of layer 34 output for VGG19 network pre-trained on ImageNet, +.>

Picture generated by generator, < >>

For the corresponding original picture with the effect of the scenery, < >>

Is the output of the arbiter.

5. The method for processing the night scene of the image based on the convolutional neural network according to claim 1, wherein the step 4 further comprises the steps of acquiring the image by the image acquisition sensor and outputting the night scene image after the image is finally modified by the super night scene network model obtained through training in the step 3; carrying out the scene rendering before outputting the night scene image, and specifically constructing a model with a scene rendering effect picture as follows:

；

Wherein,,

representing the finally obtained image ∈>

Representing the original image +.>

Representing the multiplication of the matrix element by element->

Is the i-th order blur function, +.>

Characteristic weight matrix values representing an i-th layer data image,/->

The method comprises the steps of carrying out a first treatment on the surface of the Involves the i-th order fuzzy function->

Is a superficial fuzzy neural network->

Obtained iteratively i times, expressed as:

；

wherein the loss function

Adopting a combination of a reconstruction function and a structural similarity SSIM, and optimizing a model through back propagation of error values; wherein->

The method comprises the following steps:

；

wherein,,

representing the model to generate an image with a foreground effect,/-, for example>

Original image representing an image actually with a foreground effect,/->

Representing the generated image->

And (3) the actual image>

The structural similarity is as follows:

；

wherein,,

for a preset constant, ++>

Representing the generated image->

And (3) the actual image>

Brightness relation between->

Representing the generated image->

And (3) the actual image>

Contrast relation between->

Representing the generated image->

And (3) the actual image>

Structural relationship between the two.

6. The image night scene processing method based on the convolutional neural network is characterized by comprising the following modules:

a first module for collecting a plurality of sets of RAW format data samples;

wherein the keypoints of the matching images are further as follows:

searching all image positions on a preset scale space, and extracting key points including corner points, edge points, bright points of dark areas and dark points of the bright areas through convolution operation; in which the scale space

The calculation method of (2) is as follows:

；

in the method, in the process of the invention,

midpoint coordinates representing the key points +.>

Representing a gaussian kernel function +.>

Taking a constant value for the scale space factor;

wherein the gaussian kernel function is expressed as follows:

；

wherein each symbol is as defined above;

acquiring gradient modulus values of key points:

；

the direction distribution of the acquisition key points:

；

wherein each symbol is as defined above;

calculating key pointskIs to (1) the field point of

：

；

In the method, in the process of the invention,

the directions of the key points are represented, and the other symbols have the same meaning;

the second module is used for establishing a super night scene network model;

a third module for training the super night scene network model;

The high-resolution image block acquisition mode is as follows:

a training convolutional network is then constructed:

the loss function used in the training process is:

；

Wherein,,

mean absolute error, +.>

For structural similarity, ++>

For perception loss->

Representing countermeasures against losses;

；

；

wherein,,

feature map of layer 34 output for VGG19 network pre-trained on ImageNet, +.>

Picture generated by generator, < >>

For the corresponding original picture with the effect of the scenery, < >>

Is the output of the discriminator;

a fourth module for performing a foreground rendering on the night scene image before output;

the fourth module is further configured to establish a foreground rendering model, where the model of the foreground rendering effect picture is specifically established as follows:

；

wherein,,

representing the finally obtained image ∈>

Representing the original image +.>

Representing the multiplication of the matrix element by element->

Is the i-th order blur function, +.>

Characteristic weight matrix values representing an i-th layer data image,/->

Is a superficial fuzzy neural network->

Obtained iteratively i times, expressed as:

；

wherein the loss function

The method comprises the following steps:

；

wherein,,

Original image representing an image actually with a foreground effect,/->

Representing the generated image and the actual image +. >

The structural similarity is as follows:

；

wherein,,

for a preset constant, ++>

Representing the generated image->

And (3) the actual image>

Brightness relation between->

Representing the generated image->

And (3) the actual image>

Contrast relation between->

Representing the generated image->

And (3) the actual image>

Structural relationship between the two.

7. A computing module comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method of any one of claims 1 to 5 when the computer program is executed.

8. A computer module readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 5.