CN112150363B - Convolutional neural network-based image night scene processing method, computing module for operating method and readable storage medium - Google Patents

Convolutional neural network-based image night scene processing method, computing module for operating method and readable storage medium Download PDF

Info

Publication number
CN112150363B
CN112150363B CN202011049682.7A CN202011049682A CN112150363B CN 112150363 B CN112150363 B CN 112150363B CN 202011049682 A CN202011049682 A CN 202011049682A CN 112150363 B CN112150363 B CN 112150363B
Authority
CN
China
Prior art keywords
image
resolution
representing
layer
night scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011049682.7A
Other languages
Chinese (zh)
Other versions
CN112150363A (en
Inventor
冷聪
李成华
朱宇
程健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Nanjing Artificial Intelligence Innovation Research Institute
Zhongke Fangcun Zhiwei Nanjing Technology Co ltd
Original Assignee
Zhongke Nanjing Artificial Intelligence Innovation Research Institute
Zhongke Fangcun Zhiwei Nanjing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Nanjing Artificial Intelligence Innovation Research Institute, Zhongke Fangcun Zhiwei Nanjing Technology Co ltd filed Critical Zhongke Nanjing Artificial Intelligence Innovation Research Institute
Priority to CN202011049682.7A priority Critical patent/CN112150363B/en
Publication of CN112150363A publication Critical patent/CN112150363A/en
Application granted granted Critical
Publication of CN112150363B publication Critical patent/CN112150363B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • G06T3/4076Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution using the original low-resolution images to iteratively correct the high-resolution images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention provides an image night scene processing method based on a convolutional neural network, a computing module and a readable storage medium for running the method, wherein the method comprises the following steps: step 1, collecting a plurality of groups of RAW format data samples; step 2, designing a super night scene network model; step 3, training the super night scene network model in the step 2; and 4, outputting a result. According to the invention, a super night scene network model is built and trained, a basic data set is built according to requirements, the SNN network is trained according to the basic data set, and the performance test result is obtained. If the model prediction result is not ideal, the data set is expanded or reconstructed according to the scene requirement. The super night scene picture with excellent appearance can be obtained only by taking out RAW data from the CMOS of the camera, the problems of image shake and ghost caused by long-time exposure of the traditional night scene function are avoided, and the influence of the image shake and ghost when the image is synthesized by adopting an AI technology is further avoided.

Description

Convolutional neural network-based image night scene processing method, computing module for operating method and readable storage medium
Technical Field
The invention relates to an image night scene processing method based on a convolutional neural network, a computing module and a readable storage medium for running the method, which relate to G06T: the field of image data processing or generation in general, and in particular to G06T5/00: enhancement or restoration of images.
Background
Because the area of the mobile phone camera sensor is smaller, the light sensing capability is poor, and more natural light cannot be captured under the condition of insufficient light, more noise points, insufficient brightness and weak resolving power of the picture are caused. However, the handheld long-exposure super night scene function is enabled, the brightness of the photo is greatly improved, the bright and dark details are prominent, and the overexposure phenomenon can not occur in a highlight area even if the brightness of the picture is fluctuated.
At present, super night scene functions appear on more and more mobile phones. The principle is that a plurality of ISO photos with different exposure are shot through long exposure, and then synthesized.
However, the super night scene function is more greedy, the exposure time is often more than a few seconds, and therefore the requirements on mobile phone hardware and software algorithms are very high. In addition, the super night scene mode is difficult to pick up high quality pictures in long exposure because in long exposure, the human hand must have uncontrollable slight shake, and if no algorithm is used for finishing, the mobile phone can incorporate these shake pictures, resulting in problems in the picture.
Disclosure of Invention
The invention aims to: an object is to provide an image night scene processing method based on a convolutional neural network, so as to solve the above problems in the prior art. It is a further object to propose a computing module operable to carry out the above method and a storage medium readable by the computing module.
The technical scheme is as follows: an image night scene processing method based on a convolutional neural network comprises the following steps:
step 1, collecting a plurality of groups of RAW format data samples;
step 2, designing a super night scene network model;
step 3, training the super night scene network model in the step 2;
and 4, outputting a result.
In a further embodiment, the step 1 is further:
step 1-1, shooting different scenes through a plurality of preset image acquisition devices with different models to obtain a plurality of RAW format data samples, taking the RAW format data samples acquired by the image acquisition devices with different models in the same scene as a group of parent samples, dividing the parent samples into different child samples according to the model of the image acquisition device, and marking each sample;
step 1-2, after data sample acquisition is completed, aligning the images, and removing the non-overlapping parts of the images;
Step 1-3, after the image alignment operation of step 1-2, the data samples are further divided into a training set, a verification set and a test set.
In a further embodiment, the aligning the images in the step 1-2 further includes matching key points of the images, and iterating the random subset repeatedly on the basis of the key points;
wherein the keypoints of the matching images are further as follows:
step 1-2a, searching all image positions on a preset scale space, and extracting key points including corner points, edge points, bright points of dark areas and dark points of the bright areas through convolution operation; the calculation method of the scale space L (x, y, sigma) is as follows:
L(x,y,σ)=G(x,y,σ)·C(x,y)
wherein C (x, y) represents the midpoint coordinates of the key points, G (x, y, sigma) represents a Gaussian kernel function, sigma is a scale space factor, and a constant value is taken;
wherein the gaussian kernel function is expressed as follows:
Figure GDA0002752645380000021
wherein each symbol is as defined above;
step 1-2b, collecting gradient modulus values of key points:
Figure GDA0002752645380000022
step 1-2c, collecting direction distribution of key points:
Figure GDA0002752645380000023
wherein each symbol is as defined above;
step 1-2d, calculating the domain point k of the key point k i
Figure GDA0002752645380000024
In (x) k ,y k ) The direction of the key point is indicated, and the rest symbols are the same as the above.
In a further embodiment, the step 2 further includes:
Step 2-1, establishing an SNN super night scene network model, wherein the SNN super night scene network model comprises at least one Encoder network and at least one Decoder network, and each network comprises multiple layers; the method comprises the steps that the Encoder network has multiple downsampling, at least two 3x3 convolutions are arranged in each layer, an activation function and a Switchablenormalization layer are connected after each convolution, and finally a 2x2 max_mapping operation is added, namely, the step length is 2 for downsampling; the whole Encoder network is repeated three times;
each step in the Decoder network includes upsampling the feature map, concatenating the 3x3 convolutions halving the number of feature channels with the corresponding feature map from the Encoder network, and then convolving the concatenated feature map by two 3x3 convolutions, each convolution followed by an activation function and a switchblenormalization layer; using a 3*3 convolution layer in the last layer, and finally outputting the processed image through pixel_shuffle;
step 2-2, propose Residual Dense block, put it on skip-connection, the residual Denseblock is made up of 3 Denseblocks, there are 5 convolutions inside each Denseblock, connect an activation function and switching normalization layer after each convolution, each layer accepts the output characteristic pattern from all previous convolution layers at the same time.
In a further embodiment, step 3, the training sample set is divided into a low-resolution training set and a high-resolution image block in the process of training the super night scene network model;
the low-resolution training set is obtained by the following steps: firstly, downsampling the high-resolution image by N times to obtain different low-resolution images; expanding the obtained low-resolution images, overlapping and sampling each obtained low-resolution image to obtain a group of overlapped low-resolution image blocks, and taking the overlapped low-resolution image blocks as a low-resolution training set;
the high-resolution image block acquisition mode is as follows:
overlapping and sampling the high-resolution image corresponding to the N times of downsampling operation, and then taking the obtained group of corresponding overlapped high-resolution image blocks as a high-resolution tag image; n is a positive integer;
the expansion mode of expanding the obtained low-resolution image is to perform rotation transformation of 90 degrees, 180 degrees and 270 degrees so as to obtain low-resolution images with different angles;
a training convolutional network is then constructed:
firstly, taking a low-resolution image LR as input, extracting shallow features through a convolution layer, then learning deep features of the image through a plurality of stacked CACB modules, finally fusing the extracted shallow features and deep features, and up-sampling in a sub-pixel convolution mode to obtain a high-resolution image;
The CACB module consists of four fusion convolution layers, and one-fourth of the characteristics of each fusion convolution layer are reserved for final characteristic fusion; the structural details of the fusion convolutional layer related in the module are divided into a training stage and a deployment stage;
the loss function used in the training process is:
L total =0.5*L 1 +0.05*L SSIM +0.1*L VGG +L adv
wherein L is 1 Is the average absolute error, L SSIM For structural similarity, L VGG To perceive loss, L adv Representing countermeasures against losses;
Figure GDA0002752645380000041
Figure GDA0002752645380000042
wherein F (-) is a feature map, G (I), of the 34 th layer output of the VGG19 network pre-trained on ImageNet i,j,k C is a picture generated by a generator i,j,k And D (-) is the output of the discriminator for the corresponding original picture with the foreground effect.
In a further embodiment, step 4 further includes acquiring an image by the image acquisition sensor and outputting a night scene image after the image is finally modified by the super night scene network model obtained by training in step 3; carrying out the scene rendering before outputting the night scene image, wherein the model with the scene rendering effect picture can be specifically constructed as follows:
Figure GDA0002752645380000043
wherein I is bokeh Representing the finally obtained image, I org The original image is represented by a representation of the original image,
Figure GDA0002752645380000044
representing the multiplication of the matrix element by element, B i (. Cndot.) is the ith level of blurFunction, W i Characteristic weight matrix values representing an i-th layer data image,/- >
Figure GDA0002752645380000045
Involves the ith level of blurring function B i (. About.) is a superficial fuzzy neural network->
Figure GDA0002752645380000046
Obtained iteratively i times, expressed as:
Figure GDA0002752645380000047
the loss function/adopts the combination of a reconstruction function and the structural similarity SSIM, and an error value counter-propagation optimization model is adopted; wherein l 1 The method comprises the following steps:
Figure GDA0002752645380000048
wherein I is bokeh The representation model generates an image with a foreground effect,
Figure GDA0002752645380000049
original image representing an image actually with a foreground effect,/->
Figure GDA00027526453800000410
Representing the generated image I bokeh And (3) the actual image>
Figure GDA00027526453800000411
The structural similarity is as follows:
Figure GDA00027526453800000412
wherein alpha, beta and gamma are preset constants,
Figure GDA00027526453800000413
representing the generated image I bokeh And actual image
Figure GDA00027526453800000414
Brightness relation between->
Figure GDA00027526453800000415
Representing the generated image I bokeh And (3) the actual image>
Figure GDA00027526453800000416
Contrast relation between->
Figure GDA00027526453800000417
Representing the generated image I bokeh And (3) the actual image>
Figure GDA00027526453800000418
Structural relationship between the two.
An image night scene processing system based on a convolutional neural network comprises a first module for collecting a plurality of groups of RAW format data samples; the second module is used for establishing a super night scene network model; a third module for training the super night scene network model; and a fourth module for performing a scene rendering on the night scene image before output.
The first module further shoots different scenes through a plurality of preset image acquisition devices with different models to obtain a plurality of RAW format data samples, RAW format data samples acquired by the image acquisition devices with different models in the same scene are used as a group of mother samples, the mother samples are divided into different sub-samples according to the model of the image acquisition device, and each sample is marked;
After the data sample acquisition is completed, carrying out alignment operation on the images, and removing the non-overlapping part of the images; the image alignment operation comprises matching key points of the image, and repeatedly iterating for a plurality of random subsets on the basis;
wherein the keypoints of the matching images are further as follows:
searching all image positions on a preset scale space, and extracting key points including corner points, edge points, bright points of dark areas and dark points of the bright areas through convolution operation; the calculation method of the scale space L (x, y, sigma) is as follows:
L(x,y,σ)=G(x,y,σ)·C(x,y)
wherein C (x, y) represents the midpoint coordinates of the key points, G (x, y, sigma) represents a Gaussian kernel function, sigma is a scale space factor, and a constant value is taken;
wherein the gaussian kernel function is expressed as follows:
Figure GDA0002752645380000051
wherein each symbol is as defined above;
acquiring gradient modulus values of key points:
Figure GDA0002752645380000052
the direction distribution of the acquisition key points:
Figure GDA0002752645380000053
wherein each symbol is as defined above;
domain point k for calculating key point k i
Figure GDA0002752645380000054
In (x) k ,y k ) The directions of the key points are represented, and the other symbols have the same meaning;
dividing the data sample into a training set, a verification set and a test set;
the second module is further configured to establish an SNN super night scene network model, including at least one Encoder network and at least one Decoder network, each network including multiple layers; the method comprises the steps that the Encoder network has multiple downsampling, at least two 3x3 convolutions are arranged in each layer, an activation function and a Switchablenormalization layer are connected after each convolution, and finally a 2x2 max_mapping operation is added, namely, the step length is 2 for downsampling; the whole Encoder network is repeated three times;
Each step in the Decoder network includes upsampling the feature map, concatenating the 3x3 convolutions halving the number of feature channels with the corresponding feature map from the Encoder network, and then convolving the concatenated feature map by two 3x3 convolutions, each convolution followed by an activation function and a switchblenormalization layer; using a 3*3 convolution layer in the last layer, and finally outputting the processed image through pixel_shuffle;
residual Dense block put it on skip-connection, the residual Denseblock is composed of 3 Denseblocks, there are 5 convolutions in each Denseblock, connect an activation function and a Switchablenormalization layer after each convolution, each layer accepts the output characteristic diagram from all convolution layers before at the same time;
the third module is further configured to divide the training sample set into a low resolution training set and a high resolution image block;
the low-resolution training set is obtained by the following steps: firstly, downsampling the high-resolution image by N times to obtain different low-resolution images; expanding the obtained low-resolution images, overlapping and sampling each obtained low-resolution image to obtain a group of overlapped low-resolution image blocks, and taking the overlapped low-resolution image blocks as a low-resolution training set;
The high-resolution image block acquisition mode is as follows:
overlapping and sampling the high-resolution image corresponding to the N times of downsampling operation, and then taking the obtained group of corresponding overlapped high-resolution image blocks as a high-resolution tag image; n is a positive integer;
the expansion mode of expanding the obtained low-resolution image is to perform rotation transformation of 90 degrees, 180 degrees and 270 degrees so as to obtain low-resolution images with different angles;
a training convolutional network is then constructed:
firstly, taking a low-resolution image LR as input, extracting shallow features through a convolution layer, then learning deep features of the image through a plurality of stacked CACB modules, finally fusing the extracted shallow features and deep features, and up-sampling in a sub-pixel convolution mode to obtain a high-resolution image;
the CACB module consists of four fusion convolution layers, and one-fourth of the characteristics of each fusion convolution layer are reserved for final characteristic fusion; the structural details of the fusion convolutional layer related in the module are divided into a training stage and a deployment stage;
the loss function used in the training process is:
L total =0.5*L 1 +0.05*L SSIM +0.1*L VGG +L adv
wherein L is 1 Is the average absolute error, L SSIM For structural similarity, L VGG To perceive loss, L adv Representing countermeasures against losses;
Figure GDA0002752645380000071
Figure GDA0002752645380000072
wherein F (-) is a feature map, G (I), of the 34 th layer output of the VGG19 network pre-trained on ImageNet i,j,k C is a picture generated by a generator i,j,k And D (-) is the output of the discriminator for the corresponding original picture with the foreground effect.
The fourth module is further configured to construct a model with a foreground rendering effect picture:
Figure GDA0002752645380000073
wherein I is bokeh Representing the finally obtained image, I org The original image is represented by a representation of the original image,
Figure GDA0002752645380000074
representing the multiplication of the matrix element by element, B i (. Cndot.) is the i-th order blur function, W i Characteristic weight matrix values representing an i-th layer data image,/->
Figure GDA0002752645380000075
Involves the ith level of blurring function B i (. About.) is a superficial fuzzy neural network->
Figure GDA0002752645380000076
Obtained iteratively i times, expressed as:
Figure GDA0002752645380000077
the loss function/adopts the combination of a reconstruction function and the structural similarity SSIM, and an error value counter-propagation optimization model is adopted; wherein l 1 The method comprises the following steps:
Figure GDA0002752645380000078
wherein I is boke The representation model generates an image with a foreground effect,
Figure GDA0002752645380000079
original image representing an image actually with a foreground effect,/->
Figure GDA00027526453800000710
Representing the generated image I bokeh And (3) the actual image>
Figure GDA00027526453800000711
Structural similarity betweenThe body is as follows:
Figure GDA00027526453800000712
wherein alpha, beta and gamma are preset constants,
Figure GDA00027526453800000713
Representing the generated image I bokeh And actual image
Figure GDA00027526453800000714
Brightness relation between->
Figure GDA00027526453800000715
Representing the generated image I bokeh And (3) the actual image>
Figure GDA00027526453800000716
Contrast relation between->
Figure GDA00027526453800000717
Representing the generated image I bokeh And (3) the actual image>
Figure GDA00027526453800000718
Structural relationship between the two.
A computing module comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program stored on a computer readable storage medium, the computing module being configured to execute the computer program by the processor by running the readable storage medium, thereby performing the steps of:
and step 1, data acquisition. And shooting night scenes by using a plurality of low-end mobile phones, taking out RAW format data, and simultaneously shooting long-exposure night scene RGB pictures in a single reverse mode, wherein the number of the night scenes is about 10 ten thousand pairs. After the data set is collected, the SIFT key point matching algorithm and the RANSAC algorithm are used for alignment operation, and the non-overlapping part of the SIFT key point matching algorithm and the RANSAC algorithm is removed. After matching is completed, the data is divided into a training set, a verification set and a test set.
Step 1-1, shooting different scenes through a plurality of preset image acquisition devices with different models to obtain a plurality of RAW format data samples, taking the RAW format data samples acquired by the image acquisition devices with different models in the same scene as a group of parent samples, dividing the parent samples into different child samples according to the model of the image acquisition device, and marking each sample;
Step 1-2, after data sample acquisition is completed, aligning the images, and removing the non-overlapping parts of the images; the step 1-2 of aligning the images further comprises matching key points of the images, and iterating the random subsets repeatedly on the basis;
wherein the keypoints of the matching images are further as follows:
step 1-2a, searching all image positions on a preset scale space, and extracting key points including corner points, edge points, bright points of dark areas and dark points of the bright areas through convolution operation; the calculation method of the scale space L (x, y, sigma) is as follows:
L(x,y,σ)=G(x,y,σ)·C(x,y)
wherein C (x, y) represents the midpoint coordinates of the key points, G (x, y, sigma) represents a Gaussian kernel function, sigma is a scale space factor, and a constant value is taken;
wherein the gaussian kernel function is expressed as follows:
Figure GDA0002752645380000081
wherein each symbol is as defined above;
step 1-2b, collecting gradient modulus values of key points:
Figure GDA0002752645380000082
step 1-2c, collecting direction distribution of key points:
Figure GDA0002752645380000083
wherein each symbol is as defined above;
step 1-2d, calculating the domain point k of the key point k i
Figure GDA0002752645380000091
In (x) k ,y k ) The direction of the key point is indicated, and the rest symbols are the same as the above.
Step 1-3, after the image alignment operation of step 1-2, the data samples are further divided into a training set, a verification set and a test set.
And 2, designing a model. Referring to fig. 1, and in detail to fig. 2, first, we propose Super Night network (SNN, super night scene network) whose main body is an Encoder-Decoder structure. It consists of an Encoder (left side) and a Decoder (right side). The Encoder is similar to a common classification network in that there are multiple downsampling, and the feature map changes from large to narrow to small to wide. In each layer there are two 3x3 convolutions, each followed by an activation function (leakyReLU) and a switchblenormalization layer, and finally a 2x2 max_mapping operation, i.e. a step size of 2, is added for downsampling. The number of characteristic channels will double in each downsampling step. The entire Encoder needs to repeat this step three times.
Each step in the Decoder involves upsampling the feature map, here using nearest interpolation, followed by a 3x3 convolution that halves the number of feature channels, concatenating the corresponding feature map (specially processed) from the Encoder, followed by two 3x3 convolutions, again each followed by an activation function leakyReLU and switchblenormalization layer. At the last layer, only one 3*3 convolution layer is used, and the processed image is output through pixel_shuffle.
Here, to obtain more information from RAW data, we use a skip-connection structure and put it on the skip-connection, and put it on Residual Dense block. The residual DenseBlock consists of 3 Denseblocks, each with 5 layers of convolutions inside, each layer of convolutions followed by an activation function (LeakyReLU) and a Switchablenormalization layer, while each layer accepts output profiles from all the preceding convolutions, an operation that is a concatenation. In addition, to obtain more efficient information, we add a ChannelAttention module after Densblock. It consists of an averagemapping layer and 2 3 x 3 convolution layers and a nonlinear transformation ReLU and Sigmod layer, and the connection mode is shown in fig. 3.
And 3, training a model. Based on the model and the data set, we realized rapid training of the model using distributed training, which only took 2.5 hours. Step 3, in the process of training the super night scene network model, the training sample set is divided into a low-resolution training set and a high-resolution image block;
the low-resolution training set is obtained by the following steps: firstly, downsampling the high-resolution image by N times to obtain different low-resolution images; expanding the obtained low-resolution images, overlapping and sampling each obtained low-resolution image to obtain a group of overlapped low-resolution image blocks, and taking the overlapped low-resolution image blocks as a low-resolution training set;
The high-resolution image block acquisition mode is as follows:
overlapping and sampling the high-resolution image corresponding to the N times of downsampling operation, and then taking the obtained group of corresponding overlapped high-resolution image blocks as a high-resolution tag image; n is a positive integer;
the expansion mode of expanding the obtained low-resolution image is to perform rotation transformation of 90 degrees, 180 degrees and 270 degrees so as to obtain low-resolution images with different angles;
a training convolutional network is then constructed:
firstly, taking a low-resolution image LR as input, extracting shallow features through a convolution layer, then learning deep features of the image through a plurality of stacked CACB modules, finally fusing the extracted shallow features and deep features, and up-sampling in a sub-pixel convolution mode to obtain a high-resolution image;
the CACB module consists of four fusion convolution layers, and one-fourth of the characteristics of each fusion convolution layer are reserved for final characteristic fusion; the structural details of the fusion convolutional layer related in the module are divided into a training stage and a deployment stage;
the loss function used in the training process is:
L total =0.5*L 1 +0.05*L SSIM +0.1*L VGG +L adv
wherein L is 1 Is the average absolute error, L SSIM For structural similarity, L VGG To perceive loss, L adv Representing countermeasures against losses;
Figure GDA0002752645380000101
Figure GDA0002752645380000102
wherein F (-) is a feature map, G (I), of the 34 th layer output of the VGG19 network pre-trained on ImageNet i,j,k C is a picture generated by a generator i,j,k And D (-) is the output of the discriminator for the corresponding original picture with the foreground effect.
Step 4, outputting a result, and before outputting the result, making the image pass through a preset foreground rendering model, wherein the model is constructed as follows:
Figure GDA0002752645380000103
wherein I is bokeh Representing the finally obtained image, I org The original image is represented by a representation of the original image,
Figure GDA0002752645380000111
representing the multiplication of the matrix element by element, B i (. Cndot.) is the i-th order blur function,W i characteristic weight matrix values representing an i-th layer data image,/->
Figure GDA0002752645380000112
Involves the ith level of blurring function B i (. About.) is a superficial fuzzy neural network->
Figure GDA0002752645380000113
Obtained iteratively i times, expressed as:
Figure GDA0002752645380000114
the loss function/adopts the combination of a reconstruction function and the structural similarity SSIM, and an error value counter-propagation optimization model is adopted; wherein l 1 The method comprises the following steps:
Figure GDA0002752645380000115
wherein I is bokeh The representation model generates an image with a foreground effect,
Figure GDA0002752645380000116
original image representing an image actually with a foreground effect,/->
Figure GDA0002752645380000117
Representing the generated image I bok And (3) the actual image>
Figure GDA0002752645380000118
The structural similarity is as follows:
Figure GDA0002752645380000119
Wherein alpha, beta and gamma are preset constants,
Figure GDA00027526453800001110
representing the generated image I bokeh And actual image
Figure GDA00027526453800001111
Brightness relation between->
Figure GDA00027526453800001112
Representing the generated image I bokeh And (3) the actual image>
Figure GDA00027526453800001113
Contrast relation between->
Figure GDA00027526453800001114
Representing the generated image I bokeh And (3) the actual image>
Figure GDA00027526453800001115
Structural relationship between the two.
A computer module readable storage medium having stored thereon a computer program which when executed by a processor performs the following process:
and step 1, data acquisition. And shooting night scenes by using a plurality of low-end mobile phones, taking out RAW format data, and simultaneously shooting long-exposure night scene RGB pictures in a single reverse mode, wherein the number of the night scenes is about 10 ten thousand pairs. After the data set is collected, the SIFT key point matching algorithm and the RANSAC algorithm are used for alignment operation, and the non-overlapping part of the SIFT key point matching algorithm and the RANSAC algorithm is removed. After matching is completed, the data is divided into a training set, a verification set and a test set.
Step 1-1, shooting different scenes through a plurality of preset image acquisition devices with different models to obtain a plurality of RAW format data samples, taking the RAW format data samples acquired by the image acquisition devices with different models in the same scene as a group of parent samples, dividing the parent samples into different child samples according to the model of the image acquisition device, and marking each sample;
Step 1-2, after data sample acquisition is completed, aligning the images, and removing the non-overlapping parts of the images; the step 1-2 of aligning the images further comprises matching key points of the images, and iterating the random subsets repeatedly on the basis;
wherein the keypoints of the matching images are further as follows:
step 1-2a, searching all image positions on a preset scale space, and extracting key points including corner points, edge points, bright points of dark areas and dark points of the bright areas through convolution operation; the calculation method of the scale space L (x, y, sigma) is as follows:
L(x,y,σ)=G(x,y,σ)·C(x,y)
wherein C (x, y) represents the midpoint coordinates of the key points, G (x, y, sigma) represents a Gaussian kernel function, sigma is a scale space factor, and a constant value is taken;
wherein the gaussian kernel function is expressed as follows:
Figure GDA0002752645380000121
wherein each symbol is as defined above;
step 1-2b, collecting gradient modulus values of key points:
Figure GDA0002752645380000122
step 1-2c, collecting direction distribution of key points:
Figure GDA0002752645380000123
wherein each symbol is as defined above;
step 1-2d, calculating the domain point k of the key point k i
Figure GDA0002752645380000124
In (x) k ,y k ) Representing the position of the key point, the remaining symbolsThe numbers are as defined above.
Step 1-3, after the image alignment operation of step 1-2, the data samples are further divided into a training set, a verification set and a test set.
And 2, designing a model. Referring to fig. 1, and in detail to fig. 2, first, we propose Super Night network (SNN, super night scene network) whose main body is an Encoder-Decoder structure. It consists of an Encoder (left side) and a Decoder (right side). The Encoder is similar to a common classification network in that there are multiple downsampling, and the feature map changes from large to narrow to small to wide. In each layer there are two 3x3 convolutions, each followed by an activation function (leakyReLU) and a switchblenormalization layer, and finally a 2x2 max_mapping operation, i.e. a step size of 2, is added for downsampling. The number of characteristic channels will double in each downsampling step. The entire Encoder needs to repeat this step three times.
Each step in the Decoder involves upsampling the feature map, here using nearest interpolation, followed by a 3x3 convolution that halves the number of feature channels, concatenating the corresponding feature map (specially processed) from the Encoder, followed by two 3x3 convolutions, again each followed by an activation function leakyReLU and switchblenormalization layer. At the last layer, only one 3*3 convolution layer is used, and the processed image is output through pixel_shuffle.
Here, to obtain more information from RAW data, we use a skip-connection structure and put it on the skip-connection, and put it on Residual Dense block. The residual DenseBlock consists of 3 Denseblocks, each with 5 layers of convolutions inside, each layer of convolutions followed by an activation function (LeakyReLU) and a Switchablenormalization layer, while each layer accepts output profiles from all the preceding convolutions, an operation that is a concatenation. In addition, to obtain more efficient information, we add a ChannelAttention module after Densblock. It consists of an averagemapping layer and 2 3 x 3 convolution layers and a nonlinear transformation ReLU and Sigmod layer, and the connection mode is shown in fig. 3.
And 3, training a model. Based on the model and the data set, we realized rapid training of the model using distributed training, which only took 2.5 hours. Step 3, in the process of training the super night scene network model, the training sample set is divided into a low-resolution training set and a high-resolution image block;
the low-resolution training set is obtained by the following steps: firstly, downsampling the high-resolution image by N times to obtain different low-resolution images; expanding the obtained low-resolution images, overlapping and sampling each obtained low-resolution image to obtain a group of overlapped low-resolution image blocks, and taking the overlapped low-resolution image blocks as a low-resolution training set;
The high-resolution image block acquisition mode is as follows:
overlapping and sampling the high-resolution image corresponding to the N times of downsampling operation, and then taking the obtained group of corresponding overlapped high-resolution image blocks as a high-resolution tag image; n is a positive integer;
the expansion mode of expanding the obtained low-resolution image is to perform rotation transformation of 90 degrees, 180 degrees and 270 degrees so as to obtain low-resolution images with different angles;
a training convolutional network is then constructed:
firstly, taking a low-resolution image LR as input, extracting shallow features through a convolution layer, then learning deep features of the image through a plurality of stacked CACB modules, finally fusing the extracted shallow features and deep features, and up-sampling in a sub-pixel convolution mode to obtain a high-resolution image;
the CACB module consists of four fusion convolution layers, and one-fourth of the characteristics of each fusion convolution layer are reserved for final characteristic fusion; the structural details of the fusion convolutional layer related in the module are divided into a training stage and a deployment stage;
the loss function used in the training process is:
L total =0.5*L 1 +0.05*L SSIM +0.1*L VGG +L adv
wherein L is 1 Is the average absolute error, L SSIM For structural similarity, L VGG To perceive loss, L adv Representing countermeasures against losses;
Figure GDA0002752645380000141
Figure GDA0002752645380000142
wherein F (-) is a feature map, G (I), of the 34 th layer output of the VGG19 network pre-trained on ImageNet i,j,k C is a picture generated by a generator i,j,k And D (-) is the output of the discriminator for the corresponding original picture with the foreground effect.
And 4, outputting a result.
The beneficial effects are that: the invention relates to an image night scene processing method based on a convolutional neural network, and further relates to a computing module capable of running the method and a storage medium capable of being read by the computing module. By building the super night scene network model and training the model, the super night scene picture with excellent appearance can be obtained only by taking out RAW data from a camera CMOS, the problems of image shake and ghost caused by long-time exposure of the traditional night scene function are avoided, and the influence of the image shake and ghost when the image is synthesized by adopting an AI technology is further avoided.
Drawings
FIG. 1 is a flowchart of an algorithm of the present invention.
Fig. 2 is a schematic diagram of an SNN super night scene network model in the present invention.
Fig. 3 is a schematic structural diagram of a channel attention mechanism module in the SNN super night scene network model.
Fig. 4 is a comparison diagram of a general algorithm night scene image and an algorithm night scene image related to the invention.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a more thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the invention may be practiced without one or more of these details. In other instances, well-known features have not been described in detail in order to avoid obscuring the invention.
The applicant believes that the principle of the super night scene function at present is that a plurality of ISO photos with different exposures are shot through long exposure and then synthesized, but the super night scene function is more greedy, the exposure time is often more than a few seconds, so the requirements on mobile phone hardware and software algorithms are very high. In addition, the super night scene mode is difficult to pick up high quality pictures in long exposure because in long exposure, the human hand must have uncontrollable slight shake, and if no algorithm is used for finishing, the mobile phone can incorporate these shake pictures, resulting in problems in the picture. Part of mobile phone manufacturers adopt AI technology to remove the blurred photos, then automatically align scenes through system identification, and finally synthesize the scenes.
In order to solve the above problems, we propose to use convolutional neural network to realize super night scene mode, and for our algorithm, we can obtain super night scene picture with excellent look and feel only by taking RAW data from camera CMOS. The specific algorithm flow is as follows:
and step 1, data acquisition. And shooting night scenes by using a plurality of low-end mobile phones, taking out RAW format data, and simultaneously shooting long-exposure night scene RGB pictures in a single reverse mode, wherein the number of the night scenes is about 10 ten thousand pairs. After the data set is collected, the SIFT key point matching algorithm and the RANSAC algorithm are used for alignment operation, and the non-overlapping part of the SIFT key point matching algorithm and the RANSAC algorithm is removed. After matching is completed, the data is divided into a training set, a verification set and a test set.
Step 1-1, shooting different scenes through a plurality of preset image acquisition devices with different models to obtain a plurality of RAW format data samples, taking the RAW format data samples acquired by the image acquisition devices with different models in the same scene as a group of parent samples, dividing the parent samples into different child samples according to the model of the image acquisition device, and marking each sample;
step 1-2, after data sample acquisition is completed, aligning the images, and removing the non-overlapping parts of the images; the step 1-2 of aligning the images further comprises matching key points of the images, and iterating the random subsets repeatedly on the basis;
Wherein the keypoints of the matching images are further as follows:
step 1-2a, searching all image positions on a preset scale space, and extracting key points including corner points, edge points, bright points of dark areas and dark points of the bright areas through convolution operation; the calculation method of the scale space L (x, y, sigma) is as follows:
L(x,y,σ)=G(x,y,σ)·C(x,y)
wherein C (x, y) represents the midpoint coordinates of the key points, G (x, y, sigma) represents a Gaussian kernel function, sigma is a scale space factor, and a constant value is taken;
wherein the gaussian kernel function is expressed as follows:
Figure GDA0002752645380000151
wherein each symbol is as defined above;
step 1-2b, collecting gradient modulus values of key points:
Figure GDA0002752645380000161
step 1-2c, collecting direction distribution of key points:
Figure GDA0002752645380000162
wherein each symbol is as defined above;
step 1-2d, calculating the domain point k of the key point k i
Figure GDA0002752645380000163
/>
In (x) k ,y k ) The direction of the key point is indicated, and the rest symbols are the same as the above.
Step 1-3, after the image alignment operation of step 1-2, the data samples are further divided into a training set, a verification set and a test set.
And 2, designing a model. Referring to fig. 1, and in detail, referring to fig. 2, first, we propose Super Night network (SNN, super night scene network) which is shown in fig. 2 and has an Encoder-Decoder structure as a main body. It consists of an Encoder (left side) and a Decoder (right side). The Encoder is similar to a common classification network in that there are multiple downsampling, and the feature map changes from large to narrow to small to wide. In each layer there are two 3x3 convolutions, each followed by an activation function (leakyReLU) and a switchblenormalization layer, and finally a 2x2 max_mapping operation, i.e. a step size of 2, is added for downsampling. The number of characteristic channels will double in each downsampling step. The entire Encoder needs to repeat this step three times.
Each step in the Decoder involves upsampling the feature map, here using nearest interpolation, followed by a 3x3 convolution that halves the number of feature channels, concatenating the corresponding feature map (specially processed) from the Encoder, followed by two 3x3 convolutions, again each followed by an activation function leakyReLU and switchblenormalization layer. At the last layer, only one 3*3 convolution layer is used, and the processed image is output through pixel_shuffle.
Here, to obtain more information from RAW data, we use a skip-connection structure and put it on the skip-connection, and put it on Residual Dense block. The residual DenseBlock consists of 3 Denseblocks, each with 5 layers of convolutions inside, each layer of convolutions followed by an activation function (LeakyReLU) and a Switchablenormalization layer, while each layer accepts output profiles from all the preceding convolutions, an operation that is a concatenation. In addition, to obtain more efficient information, we add a ChannelAttention module after Densblock. It consists of an averagemapping layer and 2 3x3 convolution layers and a nonlinear transformation ReLU and Sigmod layer, and the connection mode is shown in fig. 3.
And 3, training a model. Based on the model and the data set, we realized rapid training of the model using distributed training, which only took 2.5 hours. Step 3, in the process of training the super night scene network model, the training sample set is divided into a low-resolution training set and a high-resolution image block;
the low-resolution training set is obtained by the following steps: firstly, downsampling the high-resolution image by N times to obtain different low-resolution images; expanding the obtained low-resolution images, overlapping and sampling each obtained low-resolution image to obtain a group of overlapped low-resolution image blocks, and taking the overlapped low-resolution image blocks as a low-resolution training set;
the high-resolution image block acquisition mode is as follows:
overlapping and sampling the high-resolution image corresponding to the N times of downsampling operation, and then taking the obtained group of corresponding overlapped high-resolution image blocks as a high-resolution tag image; n is a positive integer;
the expansion mode of expanding the obtained low-resolution image is to perform rotation transformation of 90 degrees, 180 degrees and 270 degrees so as to obtain low-resolution images with different angles;
A training convolutional network is then constructed:
firstly, taking a low-resolution image LR as input, extracting shallow features through a convolution layer, then learning deep features of the image through a plurality of stacked CACB modules, finally fusing the extracted shallow features and deep features, and up-sampling in a sub-pixel convolution mode to obtain a high-resolution image;
the CACB module consists of four fusion convolution layers, and one-fourth of the characteristics of each fusion convolution layer are reserved for final characteristic fusion; the structural details of the fusion convolutional layer related in the module are divided into a training stage and a deployment stage;
the loss function used in the training process is:
L total =0.5*L 1 +0.05*L SSIM +0.1*L VGG +L adv
wherein L is 1 Is the average absolute error, L SSIM For structural similarity, L VGG To perceive loss, L adv Representing countermeasures against losses;
Figure GDA0002752645380000171
Figure GDA0002752645380000172
wherein F (-) is a feature map, G (I), of the 34 th layer output of the VGG19 network pre-trained on ImageNet i,j,k C is a picture generated by a generator i,j,k And D (-) is the output of the discriminator for the corresponding original picture with the foreground effect.
And 4, outputting a result.
Before outputting the result, the image passes through a preset foreground rendering model, and the model is constructed as follows:
Figure GDA0002752645380000181
wherein I is bokeh Representing the finally obtained image, I org The original image is represented by a representation of the original image,
Figure GDA0002752645380000182
representing the multiplication of the matrix element by element, B i (. Cndot.) is the i-th order blur function, W i Characteristic weight matrix values representing an i-th layer data image,/->
Figure GDA0002752645380000183
Involves the ith level of blurring function B i (. About.) is a superficial fuzzy neural network->
Figure GDA0002752645380000184
Obtained iteratively i times, expressed as:
Figure GDA0002752645380000185
the loss function/adopts the combination of a reconstruction function and the structural similarity SSIM, and an error value counter-propagation optimization model is adopted; wherein l 1 The method comprises the following steps:
Figure GDA0002752645380000186
wherein I is bokeh The representation model generates an image with a foreground effect,
Figure GDA0002752645380000187
original image representing an image actually with a foreground effect,/->
Figure GDA0002752645380000188
Representing the generated image I boke And (3) the actual image>
Figure GDA0002752645380000189
The structural similarity is as follows:
Figure GDA00027526453800001810
wherein alpha, beta and gamma are preset constants,
Figure GDA00027526453800001811
representing the generated image I bokeh And actual image
Figure GDA00027526453800001812
Brightness relation between->
Figure GDA00027526453800001813
Representing the generated image I bok And (3) the actual image>
Figure GDA00027526453800001814
Contrast relation between->
Figure GDA00027526453800001815
Representing the generated image I bokeh And (3) the actual image>
Figure GDA00027526453800001816
Structural relationship between the two.
In fig. 4, the left side is a night scene picture taken by red rice 8, and the right side is a night scene picture obtained by our network (the data used by both are RAW data of red rice 8). It is obvious that the effect of the red rice is richer than the detail of the red rice, the color is soft, and the red rice is in line with the eye feeling of people.
In conclusion, the algorithm flow can effectively improve shooting requirements of the low-end mobile phone in night scenes, and meanwhile, the cost is lower. This is less expensive for the consumer purchasing the low-end handset, but can result in the super night vision technology of the high-end handset. In addition, the requirement of the super night scene algorithm on hardware is greatly reduced, and if the NPU chip of Airia is matched, the cost is further reduced, and the cost performance of the mobile phone is improved.
As described above, although the present invention has been shown and described with reference to certain preferred embodiments, it is not to be construed as limiting the invention itself. Various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1. The image night scene processing method based on the convolutional neural network is characterized by comprising the following steps of:
step 1, collecting a plurality of groups of RAW format data samples;
step 2, designing a super night scene network model;
step 2-1, establishing an SNN super night scene network model, wherein the SNN super night scene network model comprises at least one Encoder network and at least one Decoder network, and each network comprises multiple layers; the method comprises the steps that the Encoder network has multiple downsampling, at least two 3x3 convolutions are arranged in each layer, an activation function and a Switchablenormalization layer are connected after each convolution, and finally a 2x2 max_mapping operation is added, namely, the step length is 2 for downsampling; the whole Encoder network is repeated three times;
Each step in the Decoder network includes upsampling the feature map, concatenating the 3x3 convolutions halving the number of feature channels with the corresponding feature map from the Encoder network, and then convolving the concatenated feature map by two 3x3 convolutions, each convolution followed by an activation function and a switchblenormalization layer; using a 3*3 convolution layer in the last layer, and finally outputting the processed image through pixel_shuffle;
step 2-2, providing Residual Dense block, placing the steps on a skip-connection, wherein the residual Denseblock consists of 3 Denseblocks, 5 layers of convolutions are arranged in each Denseblock, an activation function and a switching normalization layer are connected behind each layer of convolutions, and each layer receives an output characteristic diagram from all the previous convolutions;
step 3, training the super night scene network model in the step 2;
and 4, outputting a result.
2. The method for processing an image night scene based on a convolutional neural network according to claim 1, wherein the step 1 is further:
step 1-1, shooting different scenes through a plurality of preset image acquisition devices with different models to obtain a plurality of RAW format data samples, taking the RAW format data samples acquired by the image acquisition devices with different models in the same scene as a group of parent samples, dividing the parent samples into different child samples according to the model of the image acquisition device, and marking each sample;
Step 1-2, after data sample acquisition is completed, aligning the images, and removing the non-overlapping parts of the images;
step 1-3, after the image alignment operation of step 1-2, the data samples are further divided into a training set, a verification set and a test set.
3. The method for processing the night scene of the image based on the convolutional neural network according to claim 2, wherein the step 1-2 of aligning the image further comprises matching key points of the image, and iterating a plurality of random subsets repeatedly on the basis of the key points;
wherein the keypoints of the matching images are further as follows:
step 1-2a, searching all image positions on a preset scale space, and extracting key points including corner points, edge points, bright points of dark areas and dark points of the bright areas through convolution operation; in which the scale space
Figure QLYQS_1
The calculation method of (2) is as follows:
Figure QLYQS_2
in the method, in the process of the invention,
Figure QLYQS_3
midpoint coordinates representing the key points +.>
Figure QLYQS_4
Representing a gaussian kernel function +.>
Figure QLYQS_5
Taking a constant value for the scale space factor;
wherein the gaussian kernel function is expressed as follows:
Figure QLYQS_6
wherein each symbol is as defined above;
step 1-2b, collecting gradient modulus values of key points:
Figure QLYQS_7
step 1-2c, collecting direction distribution of key points:
Figure QLYQS_8
Wherein each symbol is as defined above;
step 1-2d, calculating key pointskIs to (1) the field point of
Figure QLYQS_9
Figure QLYQS_10
In the method, in the process of the invention,
Figure QLYQS_11
the direction of the key point is indicated, and the rest symbols are the same as the above.
4. The method for processing the night scene of the image based on the convolutional neural network according to claim 1, wherein in the step 3, a training sample set is divided into a low-resolution training set and a high-resolution image block in the process of training the super night scene network model;
the low-resolution training set is obtained by the following steps: firstly, downsampling the high-resolution image by N times to obtain different low-resolution images; expanding the obtained low-resolution images, overlapping and sampling each obtained low-resolution image to obtain a group of overlapped low-resolution image blocks, and taking the overlapped low-resolution image blocks as a low-resolution training set;
the high-resolution image block acquisition mode is as follows:
overlapping and sampling the high-resolution image corresponding to the N times of downsampling operation, and then taking the obtained group of corresponding overlapped high-resolution image blocks as a high-resolution tag image; n is a positive integer;
the expansion mode of expanding the obtained low-resolution image is to perform rotation transformation of 90 degrees, 180 degrees and 270 degrees so as to obtain low-resolution images with different angles;
A training convolutional network is then constructed:
firstly, taking a low-resolution image LR as input, extracting shallow features through a convolution layer, then learning deep features of the image through a plurality of stacked CACB modules, finally fusing the extracted shallow features and deep features, and up-sampling in a sub-pixel convolution mode to obtain a high-resolution image;
the CACB module consists of four fusion convolution layers, and one-fourth of the characteristics of each fusion convolution layer are reserved for final characteristic fusion; the structural details of the fusion convolutional layer related in the module are divided into a training stage and a deployment stage;
the loss function used in the training process is:
Figure QLYQS_12
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure QLYQS_13
mean absolute error, +.>
Figure QLYQS_14
For structural similarity, ++>
Figure QLYQS_15
For perception loss->
Figure QLYQS_16
Representing countermeasures against losses;
Figure QLYQS_17
Figure QLYQS_18
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure QLYQS_19
feature map of layer 34 output for VGG19 network pre-trained on ImageNet, +.>
Figure QLYQS_20
Picture generated by generator, < >>
Figure QLYQS_21
For the corresponding original picture with the effect of the scenery, < >>
Figure QLYQS_22
Is the output of the arbiter.
5. The method for processing the night scene of the image based on the convolutional neural network according to claim 1, wherein the step 4 further comprises the steps of acquiring the image by the image acquisition sensor and outputting the night scene image after the image is finally modified by the super night scene network model obtained through training in the step 3; carrying out the scene rendering before outputting the night scene image, and specifically constructing a model with a scene rendering effect picture as follows:
Figure QLYQS_23
Wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure QLYQS_25
representing the finally obtained image ∈>
Figure QLYQS_28
Representing the original image +.>
Figure QLYQS_30
Representing the multiplication of the matrix element by element->
Figure QLYQS_26
Is the i-th order blur function, +.>
Figure QLYQS_27
Characteristic weight matrix values representing an i-th layer data image,/->
Figure QLYQS_29
The method comprises the steps of carrying out a first treatment on the surface of the Involves the i-th order fuzzy function->
Figure QLYQS_31
Is a superficial fuzzy neural network->
Figure QLYQS_24
Obtained iteratively i times, expressed as:
Figure QLYQS_32
wherein the loss function
Figure QLYQS_33
Adopting a combination of a reconstruction function and a structural similarity SSIM, and optimizing a model through back propagation of error values; wherein->
Figure QLYQS_34
The method comprises the following steps:
Figure QLYQS_35
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure QLYQS_36
representing the model to generate an image with a foreground effect,/-, for example>
Figure QLYQS_37
Original image representing an image actually with a foreground effect,/->
Figure QLYQS_38
Representing the generated image->
Figure QLYQS_39
And (3) the actual image>
Figure QLYQS_40
The structural similarity is as follows:
Figure QLYQS_41
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure QLYQS_43
for a preset constant, ++>
Figure QLYQS_45
Representing the generated image->
Figure QLYQS_48
And (3) the actual image>
Figure QLYQS_44
Brightness relation between->
Figure QLYQS_47
Representing the generated image->
Figure QLYQS_49
And (3) the actual image>
Figure QLYQS_51
Contrast relation between->
Figure QLYQS_42
Representing the generated image->
Figure QLYQS_46
And (3) the actual image>
Figure QLYQS_50
Structural relationship between the two.
6. The image night scene processing method based on the convolutional neural network is characterized by comprising the following modules:
a first module for collecting a plurality of sets of RAW format data samples;
the first module further shoots different scenes through a plurality of preset image acquisition devices with different models to obtain a plurality of RAW format data samples, RAW format data samples acquired by the image acquisition devices with different models in the same scene are used as a group of mother samples, the mother samples are divided into different sub-samples according to the model of the image acquisition device, and each sample is marked;
After the data sample acquisition is completed, carrying out alignment operation on the images, and removing the non-overlapping part of the images; the image alignment operation comprises matching key points of the image, and repeatedly iterating for a plurality of random subsets on the basis;
wherein the keypoints of the matching images are further as follows:
searching all image positions on a preset scale space, and extracting key points including corner points, edge points, bright points of dark areas and dark points of the bright areas through convolution operation; in which the scale space
Figure QLYQS_52
The calculation method of (2) is as follows:
Figure QLYQS_53
in the method, in the process of the invention,
Figure QLYQS_54
midpoint coordinates representing the key points +.>
Figure QLYQS_55
Representing a gaussian kernel function +.>
Figure QLYQS_56
Taking a constant value for the scale space factor;
wherein the gaussian kernel function is expressed as follows:
Figure QLYQS_57
wherein each symbol is as defined above;
acquiring gradient modulus values of key points:
Figure QLYQS_58
the direction distribution of the acquisition key points:
Figure QLYQS_59
wherein each symbol is as defined above;
calculating key pointskIs to (1) the field point of
Figure QLYQS_60
Figure QLYQS_61
In the method, in the process of the invention,
Figure QLYQS_62
the directions of the key points are represented, and the other symbols have the same meaning;
dividing the data sample into a training set, a verification set and a test set;
the second module is used for establishing a super night scene network model;
the second module is further configured to establish an SNN super night scene network model, including at least one Encoder network and at least one Decoder network, each network including multiple layers; the method comprises the steps that the Encoder network has multiple downsampling, at least two 3x3 convolutions are arranged in each layer, an activation function and a Switchablenormalization layer are connected after each convolution, and finally a 2x2 max_mapping operation is added, namely, the step length is 2 for downsampling; the whole Encoder network is repeated three times;
Each step in the Decoder network includes upsampling the feature map, concatenating the 3x3 convolutions halving the number of feature channels with the corresponding feature map from the Encoder network, and then convolving the concatenated feature map by two 3x3 convolutions, each convolution followed by an activation function and a switchblenormalization layer; using a 3*3 convolution layer in the last layer, and finally outputting the processed image through pixel_shuffle;
residual Dense block put it on skip-connection, the residual Denseblock is composed of 3 Denseblocks, there are 5 convolutions in each Denseblock, connect an activation function and a Switchablenormalization layer after each convolution, each layer accepts the output characteristic diagram from all convolution layers before at the same time;
a third module for training the super night scene network model;
the third module is further configured to divide the training sample set into a low resolution training set and a high resolution image block;
the low-resolution training set is obtained by the following steps: firstly, downsampling the high-resolution image by N times to obtain different low-resolution images; expanding the obtained low-resolution images, overlapping and sampling each obtained low-resolution image to obtain a group of overlapped low-resolution image blocks, and taking the overlapped low-resolution image blocks as a low-resolution training set;
The high-resolution image block acquisition mode is as follows:
overlapping and sampling the high-resolution image corresponding to the N times of downsampling operation, and then taking the obtained group of corresponding overlapped high-resolution image blocks as a high-resolution tag image; n is a positive integer;
the expansion mode of expanding the obtained low-resolution image is to perform rotation transformation of 90 degrees, 180 degrees and 270 degrees so as to obtain low-resolution images with different angles;
a training convolutional network is then constructed:
firstly, taking a low-resolution image LR as input, extracting shallow features through a convolution layer, then learning deep features of the image through a plurality of stacked CACB modules, finally fusing the extracted shallow features and deep features, and up-sampling in a sub-pixel convolution mode to obtain a high-resolution image;
the CACB module consists of four fusion convolution layers, and one-fourth of the characteristics of each fusion convolution layer are reserved for final characteristic fusion; the structural details of the fusion convolutional layer related in the module are divided into a training stage and a deployment stage;
the loss function used in the training process is:
Figure QLYQS_63
Wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure QLYQS_64
mean absolute error, +.>
Figure QLYQS_65
For structural similarity, ++>
Figure QLYQS_66
For perception loss->
Figure QLYQS_67
Representing countermeasures against losses;
Figure QLYQS_68
Figure QLYQS_69
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure QLYQS_70
feature map of layer 34 output for VGG19 network pre-trained on ImageNet, +.>
Figure QLYQS_71
Picture generated by generator, < >>
Figure QLYQS_72
For the corresponding original picture with the effect of the scenery, < >>
Figure QLYQS_73
Is the output of the discriminator;
a fourth module for performing a foreground rendering on the night scene image before output;
the fourth module is further configured to establish a foreground rendering model, where the model of the foreground rendering effect picture is specifically established as follows:
Figure QLYQS_74
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure QLYQS_77
representing the finally obtained image ∈>
Figure QLYQS_78
Representing the original image +.>
Figure QLYQS_80
Representing the multiplication of the matrix element by element->
Figure QLYQS_76
Is the i-th order blur function, +.>
Figure QLYQS_79
Characteristic weight matrix values representing an i-th layer data image,/->
Figure QLYQS_81
The method comprises the steps of carrying out a first treatment on the surface of the Involves the i-th order fuzzy function->
Figure QLYQS_82
Is a superficial fuzzy neural network->
Figure QLYQS_75
Obtained iteratively i times, expressed as:
Figure QLYQS_83
wherein the loss function
Figure QLYQS_84
Adopting a combination of a reconstruction function and a structural similarity SSIM, and optimizing a model through back propagation of error values; wherein->
Figure QLYQS_85
The method comprises the following steps:
Figure QLYQS_86
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure QLYQS_87
representing the model to generate an image with a foreground effect,/-, for example>
Figure QLYQS_88
Original image representing an image actually with a foreground effect,/->
Figure QLYQS_89
Representing the generated image and the actual image +. >
Figure QLYQS_90
The structural similarity is as follows:
Figure QLYQS_91
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure QLYQS_94
for a preset constant, ++>
Figure QLYQS_97
Representing the generated image->
Figure QLYQS_101
And (3) the actual image>
Figure QLYQS_93
Brightness relation between->
Figure QLYQS_96
Representing the generated image->
Figure QLYQS_99
And (3) the actual image>
Figure QLYQS_100
Contrast relation between->
Figure QLYQS_92
Representing the generated image->
Figure QLYQS_95
And (3) the actual image>
Figure QLYQS_98
Structural relationship between the two.
7. A computing module comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method of any one of claims 1 to 5 when the computer program is executed.
8. A computer module readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 5.
CN202011049682.7A 2020-09-29 2020-09-29 Convolutional neural network-based image night scene processing method, computing module for operating method and readable storage medium Active CN112150363B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011049682.7A CN112150363B (en) 2020-09-29 2020-09-29 Convolutional neural network-based image night scene processing method, computing module for operating method and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011049682.7A CN112150363B (en) 2020-09-29 2020-09-29 Convolutional neural network-based image night scene processing method, computing module for operating method and readable storage medium

Publications (2)

Publication Number Publication Date
CN112150363A CN112150363A (en) 2020-12-29
CN112150363B true CN112150363B (en) 2023-07-07

Family

ID=73894508

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011049682.7A Active CN112150363B (en) 2020-09-29 2020-09-29 Convolutional neural network-based image night scene processing method, computing module for operating method and readable storage medium

Country Status (1)

Country Link
CN (1) CN112150363B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528979B (en) * 2021-02-10 2021-05-11 成都信息工程大学 Transformer substation inspection robot obstacle distinguishing method and system
CN115238884A (en) * 2021-04-23 2022-10-25 Oppo广东移动通信有限公司 Image processing method, image processing apparatus, storage medium, device, and model training method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007173985A (en) * 2005-12-19 2007-07-05 Canon Inc Imaging apparatus, imaging method, program, and storage medium
CN106709977A (en) * 2016-11-16 2017-05-24 北京航空航天大学 Scene night view map-based automatic light source arrangement method
CN107016403A (en) * 2017-02-23 2017-08-04 中国水利水电科学研究院 A kind of method that completed region of the city threshold value is extracted based on nighttime light data
CN109068058A (en) * 2018-08-22 2018-12-21 Oppo广东移动通信有限公司 Filming control method, device and electronic equipment under super night scene mode

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140308020A1 (en) * 2013-04-16 2014-10-16 Michael F. Seeger Marital aid
CN109035260A (en) * 2018-07-27 2018-12-18 京东方科技集团股份有限公司 A kind of sky areas dividing method, device and convolutional neural networks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007173985A (en) * 2005-12-19 2007-07-05 Canon Inc Imaging apparatus, imaging method, program, and storage medium
CN106709977A (en) * 2016-11-16 2017-05-24 北京航空航天大学 Scene night view map-based automatic light source arrangement method
CN107016403A (en) * 2017-02-23 2017-08-04 中国水利水电科学研究院 A kind of method that completed region of the city threshold value is extracted based on nighttime light data
CN109068058A (en) * 2018-08-22 2018-12-21 Oppo广东移动通信有限公司 Filming control method, device and electronic equipment under super night scene mode

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
类别信息生成式对抗网络的单图超分辨重建;杨云;张海宇;朱宇;张艳宁;;中国图象图形学报(第12期);第1777-1788页 *
计算机图形图像技术在美术领域中的应用;刘夙凯;;现代电子技术(第21期);第66-68+72页 *

Also Published As

Publication number Publication date
CN112150363A (en) 2020-12-29

Similar Documents

Publication Publication Date Title
Chen et al. Real-world single image super-resolution: A brief review
Yeung et al. Light field spatial super-resolution using deep efficient spatial-angular separable convolution
Lee et al. Deep recursive hdri: Inverse tone mapping using generative adversarial networks
Tan et al. DeepDemosaicking: Adaptive image demosaicking via multiple deep fully convolutional networks
Liu et al. Robust single image super-resolution via deep networks with sparse prior
CN111476737B (en) Image processing method, intelligent device and computer readable storage medium
RU2706891C1 (en) Method of generating a common loss function for training a convolutional neural network for converting an image into an image with drawn parts and a system for converting an image into an image with drawn parts
CN111986084A (en) Multi-camera low-illumination image quality enhancement method based on multi-task fusion
CN112150363B (en) Convolutional neural network-based image night scene processing method, computing module for operating method and readable storage medium
CN112529776B (en) Training method of image processing model, image processing method and device
An et al. Single-shot high dynamic range imaging via deep convolutional neural network
Guan et al. Srdgan: learning the noise prior for super resolution with dual generative adversarial networks
Rasheed et al. LSR: Lightening super-resolution deep network for low-light image enhancement
CN112651911A (en) High dynamic range imaging generation method based on polarization image
CN116934592A (en) Image stitching method, system, equipment and medium based on deep learning
CN113379606B (en) Face super-resolution method based on pre-training generation model
Panetta et al. Deep perceptual image enhancement network for exposure restoration
CN113628134A (en) Image noise reduction method and device, electronic equipment and storage medium
CN116245968A (en) Method for generating HDR image based on LDR image of transducer
US20230098437A1 (en) Reference-Based Super-Resolution for Image and Video Enhancement
CN115841523A (en) Double-branch HDR video reconstruction algorithm based on Raw domain
Tang et al. BMISP: Bidirectional mapping of image signal processing pipeline
CN115311149A (en) Image denoising method, model, computer-readable storage medium and terminal device
Hajisharif Computational Photography: High Dynamic Range and Light Fields
Jia et al. Learning Rich Information for Quad Bayer Remosaicing and Denoising

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Room 203b, building 3, artificial intelligence Industrial Park, 266 Chuangyan Road, Qilin science and Technology Innovation Park, Jiangning District, Nanjing City, Jiangsu Province, 211000

Applicant after: Zhongke Fangcun Zhiwei (Nanjing) Technology Co.,Ltd.

Applicant after: Zhongke Nanjing artificial intelligence Innovation Research Institute

Address before: Room 203b, building 3, artificial intelligence Industrial Park, 266 Chuangyan Road, Qilin science and Technology Innovation Park, Jiangning District, Nanjing City, Jiangsu Province, 211000

Applicant before: Zhongke Fangcun Zhiwei (Nanjing) Technology Co.,Ltd.

Applicant before: NANJING ARTIFICIAL INTELLIGENCE CHIP INNOVATION INSTITUTE, INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

GR01 Patent grant
GR01 Patent grant