CN112150363A - Convolution neural network-based image night scene processing method, and computing module and readable storage medium for operating method - Google Patents

Convolution neural network-based image night scene processing method, and computing module and readable storage medium for operating method Download PDF

Info

Publication number
CN112150363A
CN112150363A CN202011049682.7A CN202011049682A CN112150363A CN 112150363 A CN112150363 A CN 112150363A CN 202011049682 A CN202011049682 A CN 202011049682A CN 112150363 A CN112150363 A CN 112150363A
Authority
CN
China
Prior art keywords
image
resolution
layer
night scene
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011049682.7A
Other languages
Chinese (zh)
Other versions
CN112150363B (en
Inventor
冷聪
李成华
朱宇
程健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Artificial Intelligence Chip Innovation Institute Institute Of Automation Chinese Academy Of Sciences
Zhongke Fangcun Zhiwei Nanjing Technology Co ltd
Original Assignee
Nanjing Artificial Intelligence Chip Innovation Institute Institute Of Automation Chinese Academy Of Sciences
Zhongke Fangcun Zhiwei Nanjing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Artificial Intelligence Chip Innovation Institute Institute Of Automation Chinese Academy Of Sciences, Zhongke Fangcun Zhiwei Nanjing Technology Co ltd filed Critical Nanjing Artificial Intelligence Chip Innovation Institute Institute Of Automation Chinese Academy Of Sciences
Priority to CN202011049682.7A priority Critical patent/CN112150363B/en
Publication of CN112150363A publication Critical patent/CN112150363A/en
Application granted granted Critical
Publication of CN112150363B publication Critical patent/CN112150363B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • G06T3/4076Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution using the original low-resolution images to iteratively correct the high-resolution images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention provides an image night scene processing method based on a convolutional neural network, a computing module and a readable storage medium for operating the method, wherein the method comprises the following steps: step 1, collecting a plurality of groups of RAW format data samples; step 2, designing a super night scene network model; step 3, training the super night scene network model in the step 2; and 4, outputting a result. The invention establishes a super night scene network model and trains the model, firstly establishes a basic data set according to requirements, trains the SNN network according to the basic data set, and tests results according to performance. And if the model prediction result is not ideal, expanding or reconstructing the data set according to the scene requirement. The super night scene picture with excellent appearance can be obtained only by taking out RAW data from the camera CMOS, the problems of image shake and afterimage caused by long-time exposure of the traditional night scene function are avoided, and the influence of image shake and afterimage when an AI technology is adopted to synthesize the image is further avoided.

Description

Convolution neural network-based image night scene processing method, and computing module and readable storage medium for operating method
Technical Field
The invention relates to an image night scene processing method based on a convolutional neural network, a computing module and a readable storage medium for operating the method, and relates to G06T: the field of image data processing or generation in general, and in particular to G06T 5/00: the field of enhancement or restoration of images.
Background
The area of a camera sensor of the mobile phone is small, the light sensing capability is poor, and more natural light cannot be captured under the condition of insufficient light, so that the noise of a picture is more, the brightness is insufficient, and the resolving power is weak. However, the handheld long-exposure super night scene function is started, the brightness of the picture is greatly improved, the light and shade details are prominent, and even if the brightness of the picture is increased in a surging manner, the overexposure phenomenon cannot occur in a high-light area.
At present, the super night scene function appears on more and more mobile phones. The principle is that a plurality of different ISO pictures with different exposures are shot through long exposure and then synthesized.
However, the super night scene function is greedy, and the exposure time is usually more than a few seconds, so the requirements on mobile phone hardware and software algorithms are very high. In addition, the super night scene mode is difficult to select high-quality pictures in long-time exposure, because in long-time exposure, the hand can have uncontrollable slight shaking, and if the arrangement is not carried out by adopting an algorithm, the hand can combine the shaken pictures, so that the picture is in a problem.
Disclosure of Invention
The purpose of the invention is as follows: an object is to provide an image night scene processing method based on a convolutional neural network, so as to solve the above problems in the prior art. A further object is to propose a calculation module that can execute the above method and a storage medium that can be read by the calculation module.
The technical scheme is as follows: an image night scene processing method based on a convolutional neural network comprises the following steps:
step 1, collecting a plurality of groups of RAW format data samples;
step 2, designing a super night scene network model;
step 3, training the super night scene network model in the step 2;
and 4, outputting a result.
In a further embodiment, the step 1 is further:
step 1-1, shooting different scenes through a plurality of preset image acquisition devices with different models to obtain a plurality of RAW format data samples, taking the RAW format data samples acquired by the image acquisition devices with different models in the same scene as a group of mother samples, distinguishing the mother samples into different sub-samples according to the models of the image acquisition devices, and marking each sample;
step 1-2, after data sample collection is completed, aligning the images, and removing non-overlapping parts of the images;
and 1-3, after the image alignment operation of the step 1-2, further dividing the data sample into a training set, a verification set and a test set.
In a further embodiment, the aligning the images in step 1-2 further comprises matching key points of the images, and iteratively iterating a plurality of random subsets based thereon;
wherein the keypoints of the matching images are further as follows:
step 1-2a, searching all image positions on a preset scale space, and extracting key points including corner points, edge points, bright points of dark areas and dark points of bright areas through convolution operation; the calculation method of the scale space L (x, y, sigma) is as follows:
L(x,y,σ)=G(x,y,σ)·C(x,y)
in the formula, C (x, y) represents the midpoint coordinate of the key point, G (x, y, sigma) represents a Gaussian kernel function, and sigma is a scale space factor and takes a fixed value;
wherein the gaussian kernel function is represented as follows:
Figure BDA0002709162290000021
wherein each symbol has the same meaning as above;
step 1-2b, collecting gradient module values of key points:
Figure BDA0002709162290000022
step 1-2c, collecting the direction distribution of key points:
Figure BDA0002709162290000023
wherein each symbol has the same meaning as above;
step 1-2d, calculating a domain point k of the key point ki
Figure BDA0002709162290000024
In the formula (x)k,yk) The orientation of the key points is shown, and the other symbols have the same meanings as above.
In a further embodiment, the step 2 further comprises:
step 2-1, establishing an SNN super night scene network model, wherein the SNN super night scene network model comprises at least one Encoder network and at least one Decoder network, and each network comprises a plurality of layers; the Encoder network has a plurality of downsampling, each layer has at least two 3x3 convolutions, each convolution is connected with an activation function and a Switchable enormalization layer, and finally a 2x2max _ posing operation is added, namely the step length is 2 for downsampling; repeating the whole Encoder network for three times;
each step in the Decoder network includes upsampling the signature, convolving the 3x3 with half the number of signature channels, concatenating with the corresponding signature from the Encoder network, and then convolving the concatenated signature through two 3x3, each convolution followed by an activation function and a switchablenmapping layer; using a 3-by-3 convolution layer on the last layer, and finally outputting a processed image through pixel _ shuffle;
and 2-2, proposing a Residual Dense block, placing the Residual Dense block on a skip-connection, wherein the Residual Dense block consists of 3 Dense blocks, each Dense block is internally provided with 5 layers of convolutions, an activation function and a switchableRermalization layer are connected behind each layer of convolution, and each layer receives output characteristic diagrams from all the convolution layers in front.
In a further embodiment, in the process of training the super night scene network model in the step 3, the training sample set is divided into a low-resolution training set and a high-resolution image block;
the acquisition mode of the low-resolution training set is as follows: firstly, carrying out N times of downsampling on a high-resolution image to obtain different low-resolution images; expanding the obtained low-resolution images, performing overlapping sampling on each obtained low-resolution image to obtain a group of overlapped low-resolution image blocks, and taking the group of overlapped low-resolution image blocks as a low-resolution training set;
the high resolution image block acquisition mode is as follows:
overlapping and sampling the high-resolution image corresponding to the N times of downsampling operation, and then taking a group of correspondingly overlapped high-resolution image blocks as a high-resolution label image; n is a positive integer;
the expansion mode of expanding the obtained low-resolution images is that the low-resolution images are subjected to rotation transformation of 90 degrees, 180 degrees and 270 degrees, so that low-resolution images of different angles are obtained;
a training convolutional network is then constructed:
firstly, taking a low-resolution image LR as an input, extracting shallow layer features through a convolution layer, then learning deep layer features of the image through a plurality of stacked CACBs, finally fusing the extracted shallow layer features and the deep layer features, and obtaining a high-resolution image through up-sampling in a sub-pixel convolution mode;
wherein the CACB module is composed of four fused convolutional layers, and one quarter of the features of each fused convolutional layer are reserved for final feature fusion; the structural details of the fusion convolution layer related in the module are divided into a training phase and a deployment phase;
the loss function used during the training process is:
Ltotal=0.5*L1+0.05*LSSIM+0.1*LVGG+Ladv
wherein L is1To mean absolute error, LSSIMFor structural similarity, LVGGFor perception of loss, LadvRepresenting a loss of confrontation;
Figure BDA0002709162290000041
Figure BDA0002709162290000042
wherein F (-) is the feature map of the output at layer 34 of VGG19 network pre-trained on ImageNet, G (I)i,j,kFor pictures generated by the generator, Ci,j,kD (-) is the output of the discriminator, for the corresponding original picture with the shot effect.
In a further embodiment, step 4 further includes acquiring an image by an image acquisition sensor, training the super night scene network model obtained in step 3, and finally modifying the image and outputting a night scene image; performing shot rendering before outputting the night image, wherein the model with the shot rendering effect picture can be specifically constructed as follows:
Figure BDA0002709162290000043
wherein, IbokehRepresenting the finally obtained image, IorgWhich represents the original image or images of the original image,
Figure BDA0002709162290000044
representing the multiplication of the matrix element by element, Bi(. h) is the i-th order blur function, WiA characteristic weight matrix value representing an i-th layer data image,
Figure BDA0002709162290000045
involving the i-th order blur function Bi(. is a shallow fuzzy neural network
Figure BDA0002709162290000046
Obtained i iterations, which is expressed as:
Figure BDA0002709162290000047
the loss function l adopts the combination of a reconstruction function and a structural similarity SSIM, and optimizes a model through the back propagation of an error value; wherein l1The method specifically comprises the following steps:
Figure BDA0002709162290000048
wherein, IbokehThe representation model generates an image with a shot effect,
Figure BDA0002709162290000049
an original image representing an image with an actual shot effect,
Figure BDA00027091622900000410
representing the generated image IbokehWith the actual image
Figure BDA00027091622900000411
The structural similarity between the two is as follows:
Figure BDA00027091622900000412
wherein alpha, beta and gamma are preset constants,
Figure BDA00027091622900000413
representing the generated image IbokehWith the actual image
Figure BDA00027091622900000414
The relationship between the brightness of the light source and the brightness of the light source,
Figure BDA00027091622900000415
representing the generated image IbokehWith the actual image
Figure BDA00027091622900000416
The contrast ratio relationship between the two components,
Figure BDA00027091622900000417
representing the generated image IbokehWith the actual image
Figure BDA00027091622900000418
Structural relationship between them.
An image night scene processing system based on a convolutional neural network comprises a first module, a second module and a third module, wherein the first module is used for acquiring a plurality of groups of RAW format data samples; the second module is used for establishing a super night scene network model; a third module for training the super night scene network model; and a fourth module for performing shot rendering on the night scene image before output.
The first module further shoots different scenes through a plurality of preset image acquisition devices with different models to obtain a plurality of RAW format data samples, the RAW format data samples acquired by the image acquisition devices with different models in the same scene are used as a group of mother samples, the mother samples are divided into different sub-samples according to the models of the image acquisition devices, and each sample is marked;
after the data sample is acquired, aligning the images, and removing the non-overlapping parts of the images; aligning the images, including matching key points of the images, and repeatedly iterating for multiple random subsets on the basis;
wherein the keypoints of the matching images are further as follows:
searching all image positions on a preset scale space, and extracting key points including corner points, edge points, bright points of dark areas and dark points of bright areas through convolution operation; the calculation method of the scale space L (x, y, sigma) is as follows:
L(x,y,σ)=G(x,y,σ)·C(x,y)
in the formula, C (x, y) represents the midpoint coordinate of the key point, G (x, y, sigma) represents a Gaussian kernel function, and sigma is a scale space factor and takes a fixed value;
wherein the gaussian kernel function is represented as follows:
Figure BDA0002709162290000051
wherein each symbol has the same meaning as above;
collecting gradient module values of key points:
Figure BDA0002709162290000052
collecting the direction distribution of key points:
Figure BDA0002709162290000053
wherein each symbol has the same meaning as above;
calculating a domain point k of a keypoint ki
Figure BDA0002709162290000054
In the formula (x)k,yk) The direction of the key point is shown, and the meanings of the other symbols are the same as above;
dividing the data sample into a training set, a verification set and a test set;
the second module is further used for establishing an SNN super night scene network model, and comprises at least one Encoder network and at least one Decoder network, wherein each network comprises a plurality of layers; the Encoder network has a plurality of downsampling, each layer has at least two 3x3 convolutions, each convolution is connected with an activation function and a Switchable enormalization layer, and finally a 2x2max _ posing operation is added, namely the step length is 2 for downsampling; repeating the whole Encoder network for three times;
each step in the Decoder network includes upsampling the signature, convolving the 3x3 with half the number of signature channels, concatenating with the corresponding signature from the Encoder network, and then convolving the concatenated signature through two 3x3, each convolution followed by an activation function and a switchablenmapping layer; using a 3-by-3 convolution layer on the last layer, and finally outputting a processed image through pixel _ shuffle;
proposing a Residual Dense block, placing the Residual Dense block on a skip-connection, wherein the Residual Dense block consists of 3 Dense blocks, each Dense block is internally provided with 5 layers of convolutions, an activation function and a switchableRermalization layer are connected behind each layer of convolution, and each layer receives output characteristic diagrams from all the convolution layers in front;
the third module is further used for dividing the training sample set into a low-resolution training set and a high-resolution image block;
the acquisition mode of the low-resolution training set is as follows: firstly, carrying out N times of downsampling on a high-resolution image to obtain different low-resolution images; expanding the obtained low-resolution images, performing overlapping sampling on each obtained low-resolution image to obtain a group of overlapped low-resolution image blocks, and taking the group of overlapped low-resolution image blocks as a low-resolution training set;
the high resolution image block acquisition mode is as follows:
overlapping and sampling the high-resolution image corresponding to the N times of downsampling operation, and then taking a group of correspondingly overlapped high-resolution image blocks as a high-resolution label image; n is a positive integer;
the expansion mode of expanding the obtained low-resolution images is that the low-resolution images are subjected to rotation transformation of 90 degrees, 180 degrees and 270 degrees, so that low-resolution images of different angles are obtained;
a training convolutional network is then constructed:
firstly, taking a low-resolution image LR as an input, extracting shallow layer features through a convolution layer, then learning deep layer features of the image through a plurality of stacked CACBs, finally fusing the extracted shallow layer features and the deep layer features, and obtaining a high-resolution image through up-sampling in a sub-pixel convolution mode;
wherein the CACB module is composed of four fused convolutional layers, and one quarter of the features of each fused convolutional layer are reserved for final feature fusion; the structural details of the fusion convolution layer related in the module are divided into a training phase and a deployment phase;
the loss function used during the training process is:
Ltotal=0.5*L1+0.05*LSSIM+0.1*LVGG+Ladv
wherein L is1To mean absolute error, LSSIMFor structural similarity, LVGGFor perception of loss, LadvRepresenting a loss of confrontation;
Figure BDA0002709162290000071
Figure BDA0002709162290000072
wherein F (-) is the feature map of the output at layer 34 of VGG19 network pre-trained on ImageNet, G (I)i,j,kFor pictures generated by the generator, Ci,j,kD (-) is the output of the discriminator, for the corresponding original picture with the shot effect.
The fourth module is further used for constructing a model with a shot rendering effect picture:
Figure BDA0002709162290000073
wherein, IbokehRepresenting the finally obtained image, IorgWhich represents the original image or images of the original image,
Figure BDA0002709162290000074
representing the multiplication of the matrix element by element, Bi(. h) is the i-th order blur function, WiA characteristic weight matrix value representing an i-th layer data image,
Figure BDA0002709162290000075
involving the i-th order blur function Bi(. is a shallow fuzzy neural network
Figure BDA0002709162290000076
Obtained i iterations, which is expressed as:
Figure BDA0002709162290000077
the loss function l adopts the combination of a reconstruction function and a structural similarity SSIM, and optimizes a model through the back propagation of an error value; wherein l1The method specifically comprises the following steps:
Figure BDA0002709162290000078
wherein, IbokehThe representation model generates an image with a shot effect,
Figure BDA0002709162290000079
an original image representing an image with an actual shot effect,
Figure BDA00027091622900000710
representing the generated image IbokehWith the actual image
Figure BDA00027091622900000711
The structural similarity between the two is as follows:
Figure BDA00027091622900000712
wherein alpha, beta and gamma are preset constants,
Figure BDA00027091622900000713
representing the generated image IbokehWith the actual image
Figure BDA00027091622900000714
The relationship between the brightness of the light source and the brightness of the light source,
Figure BDA00027091622900000715
representing the generated image IbokehWith the actual image
Figure BDA00027091622900000716
The contrast ratio relationship between the two components,
Figure BDA00027091622900000717
representing the generated image IbokeWith the actual image
Figure BDA00027091622900000718
Structural relationship between them.
A computing module comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program being stored on a computer readable storage medium, the computing module executing the computer program by the processor by executing the readable storage medium, thereby implementing the steps of:
step 1, data acquisition. And shooting night scenes by using a plurality of low-end mobile phones, taking out RAW format data, and simultaneously shooting long-exposure night scene RGB pictures by using single reflection, wherein the number of the long-exposure night scene RGB pictures is about 10 ten thousand pairs. After the data set is collected, an SIFT key point matching algorithm and an RANSAC algorithm are used for alignment operation, and non-overlapping parts of the SIFT key point matching algorithm and the RANSAC algorithm are removed. After matching is completed, the data is divided into a training set, a validation set and a test set.
Step 1-1, shooting different scenes through a plurality of preset image acquisition devices with different models to obtain a plurality of RAW format data samples, taking the RAW format data samples acquired by the image acquisition devices with different models in the same scene as a group of mother samples, distinguishing the mother samples into different sub-samples according to the models of the image acquisition devices, and marking each sample;
step 1-2, after data sample collection is completed, aligning the images, and removing non-overlapping parts of the images; the step 1-2 of aligning the images further comprises matching key points of the images, and repeatedly iterating for multiple times to obtain random subsets on the basis;
wherein the keypoints of the matching images are further as follows:
step 1-2a, searching all image positions on a preset scale space, and extracting key points including corner points, edge points, bright points of dark areas and dark points of bright areas through convolution operation; the calculation method of the scale space L (x, y, sigma) is as follows:
L(x,y,σ)=G(x,y,σ)·C(x,y)
in the formula, C (x, y) represents the midpoint coordinate of the key point, G (x, y, sigma) represents a Gaussian kernel function, and sigma is a scale space factor and takes a fixed value;
wherein the gaussian kernel function is represented as follows:
Figure BDA0002709162290000081
wherein each symbol has the same meaning as above;
step 1-2b, collecting gradient module values of key points:
Figure BDA0002709162290000082
step 1-2c, collecting the direction distribution of key points:
Figure BDA0002709162290000083
wherein each symbol has the same meaning as above;
step 1-2d, calculating a domain point k of the key point ki
Figure BDA0002709162290000091
In the formula (x)k,yk) The orientation of the key points is shown, and the other symbols have the same meanings as above.
And 1-3, after the image alignment operation of the step 1-2, further dividing the data sample into a training set, a verification set and a test set.
And 2, designing a model. Referring to fig. 1, and fig. 2 for details, first, we propose a Super Night Network (SNN), the main body of which is an Encoder-Decoder structure. It consists of an Encoder (left side) and a Decoder (right side). Similar to a common classification network, the Encoder has multiple times of down sampling, and a feature map is changed from large to narrow to small and wide. In each layer there are two 3x3 convolutions, each convolution followed by an activation function (learkrlu) and a switchablenmapping layer, and finally a 2x2max _ posing operation, i.e. step size of 2 for downsampling. The number of signature channels will be doubled in each down-sampling step. This step needs to be repeated three times for the entire Encoder.
Each step in Decoder consists of upsampling the signature, here using nearest interpolation, followed by a convolution of 3x3 with half the number of signature channels, concatenated with the corresponding signature (after special processing) from the Encoder, followed by a convolution of the concatenated signature by two 3x3, again each convolution followed by an activation function, lakyrelu, and a switchablenwarping layer. In the last layer, only one 3 × 3 convolution layer is used, and the processed image is output through pixel _ shuffle.
In order to obtain more information from the RAW data, a skip-connection structure is used, a Residual sense block is proposed and placed on the skip-connection structure. The ResidualDensblock is composed of 3 Densblocks, each Densblock is internally provided with 5 layers of convolution, each layer of convolution is followed by an activation function (LEAKyReLU) and a SwitchabeReynarestriction layer, and meanwhile, each layer receives output characteristic graphs from all the previous convolution layers, and the operation is serial splicing. In addition, to obtain more effective information, we add a channeltention module after Denseblock. It is composed of an Averagepooling layer and 2 3 × 3 convolutional layers and a nonlinear transformation ReLU and Sigmod layer, and the connection mode is shown in FIG. 3.
And step 3, training the model. Based on the model and the data set, the rapid training of the model is realized by using distributed training, and the process only needs 2.5 hours. Step 3, dividing training samples into a low-resolution training set and a high-resolution image block in a concentrated mode in the process of training the super night scene network model;
the acquisition mode of the low-resolution training set is as follows: firstly, carrying out N times of downsampling on a high-resolution image to obtain different low-resolution images; expanding the obtained low-resolution images, performing overlapping sampling on each obtained low-resolution image to obtain a group of overlapped low-resolution image blocks, and taking the group of overlapped low-resolution image blocks as a low-resolution training set;
the high resolution image block acquisition mode is as follows:
overlapping and sampling the high-resolution image corresponding to the N times of downsampling operation, and then taking a group of correspondingly overlapped high-resolution image blocks as a high-resolution label image; n is a positive integer;
the expansion mode of expanding the obtained low-resolution images is that the low-resolution images are subjected to rotation transformation of 90 degrees, 180 degrees and 270 degrees, so that low-resolution images of different angles are obtained;
a training convolutional network is then constructed:
firstly, taking a low-resolution image LR as an input, extracting shallow layer features through a convolution layer, then learning deep layer features of the image through a plurality of stacked CACBs, finally fusing the extracted shallow layer features and the deep layer features, and obtaining a high-resolution image through up-sampling in a sub-pixel convolution mode;
wherein the CACB module is composed of four fused convolutional layers, and one quarter of the features of each fused convolutional layer are reserved for final feature fusion; the structural details of the fusion convolution layer related in the module are divided into a training phase and a deployment phase;
the loss function used during the training process is:
Ltotal=0.5*L1+0.05*LSSIM+0.1*LVGG+Ladv
wherein L is1To mean absolute error, LSSIMFor structural similarity, LVGGFor perception of loss, LadvRepresenting a loss of confrontation;
Figure BDA0002709162290000101
Figure BDA0002709162290000102
wherein F (-) is the feature map of the output at layer 34 of VGG19 network pre-trained on ImageNet, G (I)i,j,kFor pictures generated by the generator, Ci,j,kD (-) is the output of the discriminator, for the corresponding original picture with the shot effect.
And 4, outputting a result, wherein before the result is output, the image passes through a preset shot rendering model, and the model is constructed as follows:
Figure BDA0002709162290000103
wherein, IbokehRepresenting the finally obtained image, IorgWhich represents the original image or images of the original image,
Figure BDA0002709162290000111
representing the multiplication of the matrix element by element, Bi(. h) is the i-th order blur function, WiA characteristic weight matrix value representing an i-th layer data image,
Figure BDA0002709162290000112
involving the i-th order blur function Bi(. is a shallow fuzzy neural network
Figure BDA0002709162290000113
Obtained i iterations, which is expressed as:
Figure BDA0002709162290000114
the loss function l adopts the combination of a reconstruction function and a structural similarity SSIM, and optimizes a model through the back propagation of an error value; wherein l1The method specifically comprises the following steps:
Figure BDA0002709162290000115
wherein, IbokeThe representation model generates an image with a shot effect,
Figure BDA0002709162290000116
an original image representing an image with an actual shot effect,
Figure BDA0002709162290000117
representing the generated image IbokehWith the actual image
Figure BDA0002709162290000118
The structural similarity between the two is as follows:
Figure BDA0002709162290000119
wherein alpha, beta and gamma are preset constants,
Figure BDA00027091622900001110
representing the generated image IbokehWith the actual image
Figure BDA00027091622900001111
The relationship between the brightness of the light source and the brightness of the light source,
Figure BDA00027091622900001112
representing the generated image IbokehWith the actual image
Figure BDA00027091622900001113
The contrast ratio relationship between the two components,
Figure BDA00027091622900001114
representing the generated image IbokeWith the actual image
Figure BDA00027091622900001115
Structural relationship between them.
A computing module readable storage medium, having stored thereon a computer program which, when executed by a processor, performs the processes of:
step 1, data acquisition. And shooting night scenes by using a plurality of low-end mobile phones, taking out RAW format data, and simultaneously shooting long-exposure night scene RGB pictures by using single reflection, wherein the number of the long-exposure night scene RGB pictures is about 10 ten thousand pairs. After the data set is collected, an SIFT key point matching algorithm and an RANSAC algorithm are used for alignment operation, and non-overlapping parts of the SIFT key point matching algorithm and the RANSAC algorithm are removed. After matching is completed, the data is divided into a training set, a validation set and a test set.
Step 1-1, shooting different scenes through a plurality of preset image acquisition devices with different models to obtain a plurality of RAW format data samples, taking the RAW format data samples acquired by the image acquisition devices with different models in the same scene as a group of mother samples, distinguishing the mother samples into different sub-samples according to the models of the image acquisition devices, and marking each sample;
step 1-2, after data sample collection is completed, aligning the images, and removing non-overlapping parts of the images; the step 1-2 of aligning the images further comprises matching key points of the images, and repeatedly iterating for multiple times to obtain random subsets on the basis;
wherein the keypoints of the matching images are further as follows:
step 1-2a, searching all image positions on a preset scale space, and extracting key points including corner points, edge points, bright points of dark areas and dark points of bright areas through convolution operation; the calculation method of the scale space L (x, y, sigma) is as follows:
L(x,y,σ)=G(x,y,σ)·C(x,y)
in the formula, C (x, y) represents the midpoint coordinate of the key point, G (x, y, sigma) represents a Gaussian kernel function, and sigma is a scale space factor and takes a fixed value;
wherein the gaussian kernel function is represented as follows:
Figure BDA0002709162290000121
wherein each symbol has the same meaning as above;
step 1-2b, collecting gradient module values of key points:
Figure BDA0002709162290000122
step 1-2c, collecting the direction distribution of key points:
Figure BDA0002709162290000123
wherein each symbol has the same meaning as above;
step 1-2d, calculating a domain point k of the key point ki
Figure BDA0002709162290000124
In the formula (x)k,yk) The orientation of the key points is shown, and the other symbols have the same meanings as above.
And 1-3, after the image alignment operation of the step 1-2, further dividing the data sample into a training set, a verification set and a test set.
And 2, designing a model. Referring to fig. 1, and fig. 2 for details, first, we propose a Super Night Network (SNN), the main body of which is an Encoder-Decoder structure. It consists of an Encoder (left side) and a Decoder (right side). Similar to a common classification network, the Encoder has multiple times of down sampling, and a feature map is changed from large to narrow to small and wide. In each layer there are two 3x3 convolutions, each convolution followed by an activation function (learkrlu) and a switchablenmapping layer, and finally a 2x2max _ posing operation, i.e. step size of 2 for downsampling. The number of signature channels will be doubled in each down-sampling step. This step needs to be repeated three times for the entire Encoder.
Each step in Decoder consists of upsampling the signature, here using nearest interpolation, followed by a convolution of 3x3 with half the number of signature channels, concatenated with the corresponding signature (after special processing) from the Encoder, followed by a convolution of the concatenated signature by two 3x3, again each convolution followed by an activation function, lakyrelu, and a switchablenwarping layer. In the last layer, only one 3 × 3 convolution layer is used, and the processed image is output through pixel _ shuffle.
In order to obtain more information from the RAW data, a skip-connection structure is used, a Residual sense block is proposed and placed on the skip-connection structure. The ResidualDensblock is composed of 3 Densblocks, each Densblock is internally provided with 5 layers of convolution, each layer of convolution is followed by an activation function (LEAKyReLU) and a SwitchabeReynarestriction layer, and meanwhile, each layer receives output characteristic graphs from all the previous convolution layers, and the operation is serial splicing. In addition, to obtain more effective information, we add a channeltention module after Denseblock. It is composed of an Averagepooling layer and 2 3 × 3 convolutional layers and a nonlinear transformation ReLU and Sigmod layer, and the connection mode is shown in FIG. 3.
And step 3, training the model. Based on the model and the data set, the rapid training of the model is realized by using distributed training, and the process only needs 2.5 hours. Step 3, dividing training samples into a low-resolution training set and a high-resolution image block in a concentrated mode in the process of training the super night scene network model;
the acquisition mode of the low-resolution training set is as follows: firstly, carrying out N times of downsampling on a high-resolution image to obtain different low-resolution images; expanding the obtained low-resolution images, performing overlapping sampling on each obtained low-resolution image to obtain a group of overlapped low-resolution image blocks, and taking the group of overlapped low-resolution image blocks as a low-resolution training set;
the high resolution image block acquisition mode is as follows:
overlapping and sampling the high-resolution image corresponding to the N times of downsampling operation, and then taking a group of correspondingly overlapped high-resolution image blocks as a high-resolution label image; n is a positive integer;
the expansion mode of expanding the obtained low-resolution images is that the low-resolution images are subjected to rotation transformation of 90 degrees, 180 degrees and 270 degrees, so that low-resolution images of different angles are obtained;
a training convolutional network is then constructed:
firstly, taking a low-resolution image LR as an input, extracting shallow layer features through a convolution layer, then learning deep layer features of the image through a plurality of stacked CACBs, finally fusing the extracted shallow layer features and the deep layer features, and obtaining a high-resolution image through up-sampling in a sub-pixel convolution mode;
wherein the CACB module is composed of four fused convolutional layers, and one quarter of the features of each fused convolutional layer are reserved for final feature fusion; the structural details of the fusion convolution layer related in the module are divided into a training phase and a deployment phase;
the loss function used during the training process is:
Ltotal=0.5*L1+0.05*LSSIM+0.1*LVGG+Ladv
wherein L is1To mean absolute error, LSSIMFor structural similarity, LVGGFor perception of loss, LadvRepresenting a loss of confrontation;
Figure BDA0002709162290000141
Figure BDA0002709162290000142
wherein F (-) is the feature map of the output at layer 34 of VGG19 network pre-trained on ImageNet, G (I)i,j,kFor pictures generated by the generator, Ci,j,kD (-) is the output of the discriminator, for the corresponding original picture with the shot effect.
And 4, outputting a result.
Has the advantages that: the invention relates to an image night scene processing method based on a convolutional neural network, and further relates to a computing module capable of operating the method and a storage medium capable of being read by the computing module. By establishing a super night scene network model and training the model, only RAW data is taken out from a camera CMOS, a super night scene picture with excellent appearance can be obtained, the problems of image shake and afterimage caused by long-time exposure of the traditional night scene function are avoided, and the influence of image shake and afterimage when an AI technology is adopted to synthesize the image is further avoided.
Drawings
FIG. 1 is a flow chart of the algorithm of the present invention.
Fig. 2 is a schematic diagram of an SNN super night scene network model in the present invention.
Fig. 3 is a schematic structural diagram of a channel attention mechanism module in the SNN super night scene network model.
Fig. 4 is a comparison graph of a night view image of a certain common algorithm and a night view image related to the algorithm of the present invention.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a more thorough understanding of the present invention. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without one or more of these specific details. In other instances, well-known features have not been described in order to avoid obscuring the invention.
The applicant thinks that the current principle of the super night scene function is to take a plurality of different ISO pictures with different exposures through long exposure and then synthesize the pictures, but the super night scene function is greedy, and the exposure time is usually more than a few seconds, so the requirements on mobile phone hardware and software algorithms are very high. In addition, the super night scene mode is difficult to select high-quality pictures in long-time exposure, because in long-time exposure, the hand can have uncontrollable slight shaking, and if the arrangement is not carried out by adopting an algorithm, the hand can combine the shaken pictures, so that the picture is in a problem. Some mobile phone manufacturers adopt AI technology to remove the blurred photos, then automatically align the scenery through system identification, and finally synthesize the scenery.
In order to solve the problems, a convolution neural network is used for realizing a super night scene mode, and for an algorithm, only RAW data are taken out from a camera CMOS (complementary metal oxide semiconductor), so that a super night scene picture with excellent appearance can be obtained. The specific algorithm flow is as follows:
step 1, data acquisition. And shooting night scenes by using a plurality of low-end mobile phones, taking out RAW format data, and simultaneously shooting long-exposure night scene RGB pictures by using single reflection, wherein the number of the long-exposure night scene RGB pictures is about 10 ten thousand pairs. After the data set is collected, an SIFT key point matching algorithm and an RANSAC algorithm are used for alignment operation, and non-overlapping parts of the SIFT key point matching algorithm and the RANSAC algorithm are removed. After matching is completed, the data is divided into a training set, a validation set and a test set.
Step 1-1, shooting different scenes through a plurality of preset image acquisition devices with different models to obtain a plurality of RAW format data samples, taking the RAW format data samples acquired by the image acquisition devices with different models in the same scene as a group of mother samples, distinguishing the mother samples into different sub-samples according to the models of the image acquisition devices, and marking each sample;
step 1-2, after data sample collection is completed, aligning the images, and removing non-overlapping parts of the images; the step 1-2 of aligning the images further comprises matching key points of the images, and repeatedly iterating for multiple times to obtain random subsets on the basis;
wherein the keypoints of the matching images are further as follows:
step 1-2a, searching all image positions on a preset scale space, and extracting key points including corner points, edge points, bright points of dark areas and dark points of bright areas through convolution operation; the calculation method of the scale space L (x, y, sigma) is as follows:
L(x,y,σ)=G(x,y,σ)·C(x,y)
in the formula, C (x, y) represents the midpoint coordinate of the key point, G (x, y, sigma) represents a Gaussian kernel function, and sigma is a scale space factor and takes a fixed value;
wherein the gaussian kernel function is represented as follows:
Figure BDA0002709162290000151
wherein each symbol has the same meaning as above;
step 1-2b, collecting gradient module values of key points:
Figure BDA0002709162290000161
step 1-2c, collecting the direction distribution of key points:
Figure BDA0002709162290000162
wherein each symbol has the same meaning as above;
step 1-2d, calculating a domain point k of the key point ki
Figure BDA0002709162290000163
In the formula (x)k,yk) The orientation of the key points is shown, and the other symbols have the same meanings as above.
And 1-3, after the image alignment operation of the step 1-2, further dividing the data sample into a training set, a verification set and a test set.
And 2, designing a model. Referring to fig. 1 and fig. 2 in detail, first, we propose a Super Night Network (SNN), which has a main body of an Encoder-Decoder structure as shown in fig. 2. It consists of an Encoder (left side) and a Decoder (right side). Similar to a common classification network, the Encoder has multiple times of down sampling, and a feature map is changed from large to narrow to small and wide. In each layer there are two 3x3 convolutions, each convolution followed by an activation function (learkrlu) and a switchablenmapping layer, and finally a 2x2max _ posing operation, i.e. step size of 2 for downsampling. The number of signature channels will be doubled in each down-sampling step. This step needs to be repeated three times for the entire Encoder.
Each step in Decoder consists of upsampling the signature, here using nearest interpolation, followed by a convolution of 3x3 with half the number of signature channels, concatenated with the corresponding signature (after special processing) from the Encoder, followed by a convolution of the concatenated signature by two 3x3, again each convolution followed by an activation function, lakyrelu, and a switchablenwarping layer. In the last layer, only one 3 × 3 convolution layer is used, and the processed image is output through pixel _ shuffle.
In order to obtain more information from the RAW data, a skip-connection structure is used, a Residual sense block is proposed and placed on the skip-connection structure. The ResidualDensblock is composed of 3 Densblocks, each Densblock is internally provided with 5 layers of convolution, each layer of convolution is followed by an activation function (LEAKyReLU) and a SwitchabeReynarestriction layer, and meanwhile, each layer receives output characteristic graphs from all the previous convolution layers, and the operation is serial splicing. In addition, to obtain more effective information, we add a channeltention module after Denseblock. It is composed of an Averagepooling layer and 2 3 × 3 convolutional layers and a nonlinear transformation ReLU and Sigmod layer, and the connection mode is shown in FIG. 3.
And step 3, training the model. Based on the model and the data set, the rapid training of the model is realized by using distributed training, and the process only needs 2.5 hours. Step 3, dividing training samples into a low-resolution training set and a high-resolution image block in a concentrated mode in the process of training the super night scene network model;
the acquisition mode of the low-resolution training set is as follows: firstly, carrying out N times of downsampling on a high-resolution image to obtain different low-resolution images; expanding the obtained low-resolution images, performing overlapping sampling on each obtained low-resolution image to obtain a group of overlapped low-resolution image blocks, and taking the group of overlapped low-resolution image blocks as a low-resolution training set;
the high resolution image block acquisition mode is as follows:
overlapping and sampling the high-resolution image corresponding to the N times of downsampling operation, and then taking a group of correspondingly overlapped high-resolution image blocks as a high-resolution label image; n is a positive integer;
the expansion mode of expanding the obtained low-resolution images is that the low-resolution images are subjected to rotation transformation of 90 degrees, 180 degrees and 270 degrees, so that low-resolution images of different angles are obtained;
a training convolutional network is then constructed:
firstly, taking a low-resolution image LR as an input, extracting shallow layer features through a convolution layer, then learning deep layer features of the image through a plurality of stacked CACBs, finally fusing the extracted shallow layer features and the deep layer features, and obtaining a high-resolution image through up-sampling in a sub-pixel convolution mode;
wherein the CACB module is composed of four fused convolutional layers, and one quarter of the features of each fused convolutional layer are reserved for final feature fusion; the structural details of the fusion convolution layer related in the module are divided into a training phase and a deployment phase;
the loss function used during the training process is:
Ltotal=0.5*L1+0.05*LSSIM+0.1*LVGG+Ladv
wherein L is1To mean absolute error, LSSIMFor structural similarity, LVGGFor perception of loss, LadvRepresenting a loss of confrontation;
Figure BDA0002709162290000171
Figure BDA0002709162290000172
wherein F (-) is the feature map of the output at layer 34 of VGG19 network pre-trained on ImageNet, G (I)i,j,kFor pictures generated by the generator, Ci,j,kD (-) is the output of the discriminator, for the corresponding original picture with the shot effect.
And 4, outputting a result.
Before outputting a result, rendering the image through a preset shot scene model, wherein the model is constructed as follows:
Figure BDA0002709162290000181
wherein, IbokehRepresenting the finally obtained image, IorgWhich represents the original image or images of the original image,
Figure BDA0002709162290000182
representing the multiplication of the matrix element by element, Bi(. h) is the i-th order blur function, WiA characteristic weight matrix value representing an i-th layer data image,
Figure BDA0002709162290000183
involving the i-th order blur function Bi(. is a shallow fuzzy neural network
Figure BDA0002709162290000184
Obtained i iterations, which is expressed as:
Figure BDA0002709162290000185
the loss function l adopts the combination of a reconstruction function and a structural similarity SSIM, and optimizes a model through the back propagation of an error value; wherein l1The method specifically comprises the following steps:
Figure BDA0002709162290000186
wherein, IbokehThe representation model generates an image with a shot effect,
Figure BDA0002709162290000187
an original image representing an image with an actual shot effect,
Figure BDA0002709162290000188
representing the generated image IbokehWith the actual image
Figure BDA0002709162290000189
The structural similarity between the two is as follows:
Figure BDA00027091622900001810
wherein alpha, beta and gamma are preset constants,
Figure BDA00027091622900001811
representing the generated image IbokehWith the actual image
Figure BDA00027091622900001812
The relationship between the brightness of the light source and the brightness of the light source,
Figure BDA00027091622900001813
representing the generated image IbokehWith the actual image
Figure BDA00027091622900001814
The contrast ratio relationship between the two components,
Figure BDA00027091622900001815
representing the generated image IbokeWith the actual image
Figure BDA00027091622900001816
Structural relationship between them.
In fig. 4, the left side is a night scene picture taken by red rice 8, and the right side is a night scene picture obtained by our network (both data are RAW data of red rice 8). The effect of the red rice wine is richer than the details of the red rice, and the red rice wine is soft in color and accords with the visual sense of human eyes.
In conclusion, the algorithm flow can effectively improve the shooting requirement of the low-end mobile phone in the night scene, and meanwhile, the cost is lower. This is a super night scene technology that costs less money for consumers who purchase low end phones, but can be used with high end phones. In addition, the algorithm greatly reduces the requirement of the super night scene algorithm on hardware, and if the super night scene algorithm is matched with an Airia NPU chip, the cost is further reduced, and the cost performance of the mobile phone is improved.
As noted above, while the present invention has been shown and described with reference to certain preferred embodiments, it is not to be construed as limited thereto. Various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. An image night scene processing method based on a convolutional neural network is characterized by comprising the following steps:
step 1, collecting a plurality of groups of RAW format data samples;
step 2, designing a super night scene network model;
step 3, training the super night scene network model in the step 2;
and 4, outputting a result.
2. The convolutional neural network-based image night scene processing method as claimed in claim 1, wherein the step 1 further comprises:
step 1-1, shooting different scenes through a plurality of preset image acquisition devices with different models to obtain a plurality of RAW format data samples, taking the RAW format data samples acquired by the image acquisition devices with different models in the same scene as a group of mother samples, distinguishing the mother samples into different sub-samples according to the models of the image acquisition devices, and marking each sample;
step 1-2, after data sample collection is completed, aligning the images, and removing non-overlapping parts of the images;
and 1-3, after the image alignment operation of the step 1-2, further dividing the data sample into a training set, a verification set and a test set.
3. The convolutional neural network-based image night scene processing method as claimed in claim 2, wherein the aligning of the images in step 1-2 further comprises matching key points of the images, and iterating repeatedly a plurality of times on the basis of the key points;
wherein the keypoints of the matching images are further as follows:
step 1-2a, searching all image positions on a preset scale space, and extracting key points including corner points, edge points, bright points of dark areas and dark points of bright areas through convolution operation; the calculation method of the scale space L (x, y, sigma) is as follows:
L(x,y,σ)=G(x,y,σ)·C(x,y)
in the formula, C (x, y) represents the midpoint coordinate of the key point, G (x, y, sigma) represents a Gaussian kernel function, and sigma is a scale space factor and takes a fixed value;
wherein the gaussian kernel function is represented as follows:
Figure FDA0002709162280000011
wherein each symbol has the same meaning as above;
step 1-2b, collecting gradient module values of key points:
Figure FDA0002709162280000021
step 1-2c, collecting the direction distribution of key points:
Figure FDA0002709162280000022
wherein each symbol has the same meaning as above;
step 1-2d, calculating a domain point k of the key point ki
Figure FDA0002709162280000023
In the formula (x)k,yk) The orientation of the key points is shown, and the other symbols have the same meanings as above.
4. The convolutional neural network-based image night scene processing method as claimed in claim 1, wherein the step 2 further comprises:
step 2-1, establishing an SNN super night scene network model, wherein the SNN super night scene network model comprises at least one Encoder network and at least one Decoder network, and each network comprises a plurality of layers; the Encoder network has a plurality of downsampling, at least two 3x3 convolutions exist in each layer, an activation function and a Switchable enormalization layer are connected after each convolution, and finally a 2x2max _ posing operation is added, namely the step length is 2 for downsampling; repeating the whole Encoder network for three times;
each step in the Decoder network includes upsampling the signature, convolving the 3x3 with half the number of signature channels, concatenating with the corresponding signature from the Encoder network, and then convolving the concatenated signature through two 3x3, each convolution followed by an activation function and a switchablenmapping layer; using a 3-by-3 convolution layer on the last layer, and finally outputting a processed image through pixel _ shuffle;
and 2-2, proposing a Residual Dense block, placing the Residual Dense block on a skip-connection, wherein the Residual Dense block consists of 3 Dense blocks, each Dense block is internally provided with 5 layers of convolutions, an activation function and a switchableRermalization layer are connected behind each layer of convolution, and each layer receives output characteristic diagrams from all the convolution layers in front.
5. The image night scene processing method based on the convolutional neural network as claimed in claim 1, wherein in the process of training the super night scene network model in step 3, the training sample set is divided into a low-resolution training set and a high-resolution image block;
the acquisition mode of the low-resolution training set is as follows: firstly, carrying out N times of downsampling on a high-resolution image to obtain different low-resolution images; expanding the obtained low-resolution images, performing overlapping sampling on each obtained low-resolution image to obtain a group of overlapped low-resolution image blocks, and taking the group of overlapped low-resolution image blocks as a low-resolution training set;
the high resolution image block acquisition mode is as follows:
overlapping and sampling the high-resolution image corresponding to the N times of downsampling operation, and then taking a group of correspondingly overlapped high-resolution image blocks as a high-resolution label image; n is a positive integer;
the expansion mode of expanding the obtained low-resolution images is that the low-resolution images are subjected to rotation transformation of 90 degrees, 180 degrees and 270 degrees, so that low-resolution images of different angles are obtained;
a training convolutional network is then constructed:
firstly, taking a low-resolution image LR as an input, extracting shallow layer features through a convolution layer, then learning deep layer features of the image through a plurality of stacked CACBs, finally fusing the extracted shallow layer features and the deep layer features, and obtaining a high-resolution image through up-sampling in a sub-pixel convolution mode;
wherein the CACB module is composed of four fused convolutional layers, and one quarter of the features of each fused convolutional layer are reserved for final feature fusion; the structural details of the fusion convolution layer related in the module are divided into a training phase and a deployment phase;
the loss function used during the training process is:
Ltotal=0.5*L1+0.05*LSSIM+0.1*LVGG+Ladv
wherein L is1To mean absolute error, LSSIMFor structural similarity, LVGGFor perception of loss, LadvRepresenting a loss of confrontation;
Figure FDA0002709162280000031
Figure FDA0002709162280000032
wherein F (-) is the feature map of the output at layer 34 of VGG19 network pre-trained on ImageNet, G (I)i,j,kFor pictures generated by the generator, Ci,j,kD (-) is the output of the discriminator, for the corresponding original picture with the shot effect.
6. The convolutional neural network-based image night scene processing method as claimed in claim 1, wherein step 4 further comprises acquiring an image by an image acquisition sensor, outputting a night scene image after the image is finally modified by using the super night scene network model obtained by training in step 3; performing shot rendering before outputting the night image, wherein the model with the shot rendering effect picture can be specifically constructed as follows:
Figure FDA0002709162280000033
wherein, IbokehRepresenting the finally obtained image, IorgWhich represents the original image or images of the original image,
Figure FDA0002709162280000034
representing the multiplication of the matrix element by element, Bi(. h) is the i-th order blur function, WiA characteristic weight matrix value representing an i-th layer data image,
Figure FDA0002709162280000041
involving the i-th order blur function Bi(. is a shallow fuzzy neural network
Figure FDA0002709162280000042
Obtained i iterations, which is expressed as:
Figure FDA0002709162280000043
the loss function l adopts the combination of a reconstruction function and a structural similarity SSIM, and optimizes a model through the back propagation of an error value; wherein l1The method specifically comprises the following steps:
Figure FDA0002709162280000044
wherein, IbokehThe representation model generates an image with a shot effect,
Figure FDA0002709162280000045
an original image representing an image with an actual shot effect,
Figure FDA0002709162280000046
representation generationImage I ofbokehWith the actual image
Figure FDA0002709162280000047
The structural similarity between the two is as follows:
Figure FDA0002709162280000048
wherein alpha, beta and gamma are preset constants,
Figure FDA0002709162280000049
representing the generated image IbokehWith the actual image
Figure FDA00027091622800000410
The relationship between the brightness of the light source and the brightness of the light source,
Figure FDA00027091622800000411
representing the generated image IbokehWith the actual image
Figure FDA00027091622800000412
The contrast ratio relationship between the two components,
Figure FDA00027091622800000413
representing the generated image IbokehWith the actual image
Figure FDA00027091622800000414
Structural relationship between them.
7. An image night scene processing method based on a convolutional neural network is characterized by comprising the following modules:
a first module for collecting a plurality of sets of RAW format data samples;
the second module is used for establishing a super night scene network model;
a third module for training the super night scene network model;
and the fourth module is used for performing shot rendering on the night scene image before output.
8. The method according to claim 7, wherein the first module further obtains a plurality of RAW format data samples by shooting different scenes by a plurality of predetermined image acquisition devices of different models, takes the RAW format data samples acquired by the image acquisition devices of different models in the same scene as a set of mother samples, divides the mother samples into different subsamples according to the models of the image acquisition devices, and marks each sample;
after the data sample is acquired, aligning the images, and removing the non-overlapping parts of the images; aligning the images, including matching key points of the images, and repeatedly iterating for multiple random subsets on the basis;
wherein the keypoints of the matching images are further as follows:
searching all image positions on a preset scale space, and extracting key points including corner points, edge points, bright points of dark areas and dark points of bright areas through convolution operation; the calculation method of the scale space L (x, y, sigma) is as follows:
L(x,y,σ)=G(x,y,σ)·C(x,y)
in the formula, C (x, y) represents the midpoint coordinate of the key point, G (x, y, sigma) represents a Gaussian kernel function, and sigma is a scale space factor and takes a fixed value;
wherein the gaussian kernel function is represented as follows:
Figure FDA0002709162280000051
wherein each symbol has the same meaning as above;
collecting gradient module values of key points:
Figure FDA0002709162280000052
collecting the direction distribution of key points:
Figure FDA0002709162280000053
wherein each symbol has the same meaning as above;
calculating a domain point k of a keypoint ki
Figure FDA0002709162280000054
In the formula (x)k,yk) The direction of the key point is shown, and the meanings of the other symbols are the same as above;
dividing the data sample into a training set, a verification set and a test set;
the second module is further used for establishing an SNN super night scene network model, and comprises at least one Encoder network and at least one Decoder network, wherein each network comprises a plurality of layers; the Encoder network has a plurality of downsampling, at least two 3x3 convolutions exist in each layer, an activation function and a Switchable enormalization layer are connected after each convolution, and finally a 2x2max _ posing operation is added, namely the step length is 2 for downsampling; repeating the whole Encoder network for three times;
each step in the Decoder network includes upsampling the signature, convolving the 3x3 with half the number of signature channels, concatenating with the corresponding signature from the Encoder network, and then convolving the concatenated signature through two 3x3, each convolution followed by an activation function and a switchablenmapping layer; using a 3-by-3 convolution layer on the last layer, and finally outputting a processed image through pixel _ shuffle;
proposing a Residual Dense block, placing the Residual Dense block on a skip-connection, wherein the Residual Dense block consists of 3 Dense blocks, each Dense block is internally provided with 5 layers of convolutions, an activation function and a switchableRermalization layer are connected behind each layer of convolution, and each layer receives output characteristic diagrams from all the convolution layers in front;
the third module is further used for dividing the training sample set into a low-resolution training set and a high-resolution image block;
the acquisition mode of the low-resolution training set is as follows: firstly, carrying out N times of downsampling on a high-resolution image to obtain different low-resolution images; expanding the obtained low-resolution images, performing overlapping sampling on each obtained low-resolution image to obtain a group of overlapped low-resolution image blocks, and taking the group of overlapped low-resolution image blocks as a low-resolution training set;
the high resolution image block acquisition mode is as follows:
overlapping and sampling the high-resolution image corresponding to the N times of downsampling operation, and then taking a group of correspondingly overlapped high-resolution image blocks as a high-resolution label image; n is a positive integer;
the expansion mode of expanding the obtained low-resolution images is that the low-resolution images are subjected to rotation transformation of 90 degrees, 180 degrees and 270 degrees, so that low-resolution images of different angles are obtained;
a training convolutional network is then constructed:
firstly, taking a low-resolution image LR as an input, extracting shallow layer features through a convolution layer, then learning deep layer features of the image through a plurality of stacked CACBs, finally fusing the extracted shallow layer features and the deep layer features, and obtaining a high-resolution image through up-sampling in a sub-pixel convolution mode;
wherein the CACB module is composed of four fused convolutional layers, and one quarter of the features of each fused convolutional layer are reserved for final feature fusion; the structural details of the fusion convolution layer related in the module are divided into a training phase and a deployment phase;
the loss function used during the training process is:
Ltotal=0.5*L1+0.05*LsSIM+0.1*LVGG+Ladv
wherein L is1To mean absolute error, LSSIMIs a structural phaseSimilarity, LVGGFor perception of loss, LadvRepresenting a loss of confrontation;
Figure FDA0002709162280000061
Figure FDA0002709162280000062
wherein F (-) is the feature map of the output at layer 34 of VGG19 network pre-trained on ImageNet, G (I)i,j,kFor pictures generated by the generator, Ci,j,kD (-) is the output of the discriminator, and is the corresponding original picture with the shot effect;
the fourth module is further configured to establish a shot rendering model, and the model of the shot rendering effect picture may be specifically constructed as follows:
Figure FDA0002709162280000071
wherein, IbokehRepresenting the finally obtained image, IorgWhich represents the original image or images of the original image,
Figure FDA0002709162280000072
representing the multiplication of the matrix element by element, Bi(. h) is the i-th order blur function, WiA characteristic weight matrix value representing an i-th layer data image,
Figure FDA0002709162280000073
involving the i-th order blur function Bi(. is a shallow fuzzy neural network
Figure FDA0002709162280000074
Obtained i iterations, which is expressed as:
Figure FDA0002709162280000075
the loss function l adopts the combination of a reconstruction function and a structural similarity SSIM, and optimizes a model through the back propagation of an error value; wherein l1The method specifically comprises the following steps:
Figure FDA0002709162280000076
wherein, IbokeThe representation model generates an image with a shot effect,
Figure FDA0002709162280000077
an original image representing an image with an actual shot effect,
Figure FDA0002709162280000078
representing the generated image IbokehWith the actual image
Figure FDA0002709162280000079
The structural similarity between the two is as follows:
Figure FDA00027091622800000710
wherein alpha, beta and gamma are preset constants,
Figure FDA00027091622800000711
representing the generated image IbokehWith the actual image
Figure FDA00027091622800000712
The relationship between the brightness of the light source and the brightness of the light source,
Figure FDA00027091622800000713
representing the generated image IbokehWith the actual image
Figure FDA00027091622800000714
The contrast ratio relationship between the two components,
Figure FDA00027091622800000715
representing the generated image IboWith the actual image
Figure FDA00027091622800000716
Structural relationship between them.
9. A computing module comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 6 are implemented when the computer program is executed by the processor.
10. A storage medium readable by a computing module, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.
CN202011049682.7A 2020-09-29 2020-09-29 Convolutional neural network-based image night scene processing method, computing module for operating method and readable storage medium Active CN112150363B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011049682.7A CN112150363B (en) 2020-09-29 2020-09-29 Convolutional neural network-based image night scene processing method, computing module for operating method and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011049682.7A CN112150363B (en) 2020-09-29 2020-09-29 Convolutional neural network-based image night scene processing method, computing module for operating method and readable storage medium

Publications (2)

Publication Number Publication Date
CN112150363A true CN112150363A (en) 2020-12-29
CN112150363B CN112150363B (en) 2023-07-07

Family

ID=73894508

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011049682.7A Active CN112150363B (en) 2020-09-29 2020-09-29 Convolutional neural network-based image night scene processing method, computing module for operating method and readable storage medium

Country Status (1)

Country Link
CN (1) CN112150363B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528979A (en) * 2021-02-10 2021-03-19 成都信息工程大学 Transformer substation inspection robot obstacle distinguishing method and system
WO2022222652A1 (en) * 2021-04-23 2022-10-27 Oppo广东移动通信有限公司 Image processing method and apparatus, storage medium, device, and model training method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007173985A (en) * 2005-12-19 2007-07-05 Canon Inc Imaging apparatus, imaging method, program, and storage medium
US20140308020A1 (en) * 2013-04-16 2014-10-16 Michael F. Seeger Marital aid
CN106709977A (en) * 2016-11-16 2017-05-24 北京航空航天大学 Scene night view map-based automatic light source arrangement method
CN107016403A (en) * 2017-02-23 2017-08-04 中国水利水电科学研究院 A kind of method that completed region of the city threshold value is extracted based on nighttime light data
CN109068058A (en) * 2018-08-22 2018-12-21 Oppo广东移动通信有限公司 Filming control method, device and electronic equipment under super night scene mode
US20200034648A1 (en) * 2018-07-27 2020-01-30 Boe Technology Group Co., Ltd. Method and apparatus for segmenting sky area, and convolutional neural network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007173985A (en) * 2005-12-19 2007-07-05 Canon Inc Imaging apparatus, imaging method, program, and storage medium
US20140308020A1 (en) * 2013-04-16 2014-10-16 Michael F. Seeger Marital aid
CN106709977A (en) * 2016-11-16 2017-05-24 北京航空航天大学 Scene night view map-based automatic light source arrangement method
CN107016403A (en) * 2017-02-23 2017-08-04 中国水利水电科学研究院 A kind of method that completed region of the city threshold value is extracted based on nighttime light data
US20200034648A1 (en) * 2018-07-27 2020-01-30 Boe Technology Group Co., Ltd. Method and apparatus for segmenting sky area, and convolutional neural network
CN109068058A (en) * 2018-08-22 2018-12-21 Oppo广东移动通信有限公司 Filming control method, device and electronic equipment under super night scene mode

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘夙凯;: "计算机图形图像技术在美术领域中的应用", 现代电子技术, no. 21, pages 66 - 68 *
杨云;张海宇;朱宇;张艳宁;: "类别信息生成式对抗网络的单图超分辨重建", 中国图象图形学报, no. 12, pages 1777 - 1788 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528979A (en) * 2021-02-10 2021-03-19 成都信息工程大学 Transformer substation inspection robot obstacle distinguishing method and system
WO2022222652A1 (en) * 2021-04-23 2022-10-27 Oppo广东移动通信有限公司 Image processing method and apparatus, storage medium, device, and model training method

Also Published As

Publication number Publication date
CN112150363B (en) 2023-07-07

Similar Documents

Publication Publication Date Title
Tan et al. DeepDemosaicking: Adaptive image demosaicking via multiple deep fully convolutional networks
RU2706891C1 (en) Method of generating a common loss function for training a convolutional neural network for converting an image into an image with drawn parts and a system for converting an image into an image with drawn parts
CN111476737A (en) Image processing method, intelligent device and computer readable storage medium
Afifi et al. Cie xyz net: Unprocessing images for low-level computer vision tasks
CN112529776B (en) Training method of image processing model, image processing method and device
WO2021063119A1 (en) Method and apparatus for image processing, terminal
US20220207651A1 (en) Method and apparatus for image processing
CN112949636B (en) License plate super-resolution recognition method, system and computer readable medium
CN114418853B (en) Image super-resolution optimization method, medium and equipment based on similar image retrieval
Xu et al. Joint demosaicing and super-resolution (JDSR): Network design and perceptual optimization
CN112150363B (en) Convolutional neural network-based image night scene processing method, computing module for operating method and readable storage medium
Guan et al. Srdgan: learning the noise prior for super resolution with dual generative adversarial networks
Zhang et al. Deep motion blur removal using noisy/blurry image pairs
CN114820581B (en) Axisymmetric optical imaging parallel simulation method and device
Li et al. Cross-patch graph convolutional network for image denoising
Rasheed et al. LSR: Lightening super-resolution deep network for low-light image enhancement
Meng et al. Gia-net: Global information aware network for low-light imaging
CN113379606B (en) Face super-resolution method based on pre-training generation model
Wan et al. Purifying low-light images via near-infrared enlightened image
Mehta et al. Gated multi-resolution transfer network for burst restoration and enhancement
CN108401104B (en) Dual-focus camera digital zooming method based on frequency band repair and super-resolution
CN113628134A (en) Image noise reduction method and device, electronic equipment and storage medium
Kınlı et al. Modeling the lighting in scenes as style for auto white-balance correction
CN113205005B (en) Low-illumination low-resolution face image reconstruction method
Tang et al. BMISP: Bidirectional mapping of image signal processing pipeline

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Room 203b, building 3, artificial intelligence Industrial Park, 266 Chuangyan Road, Qilin science and Technology Innovation Park, Jiangning District, Nanjing City, Jiangsu Province, 211000

Applicant after: Zhongke Fangcun Zhiwei (Nanjing) Technology Co.,Ltd.

Applicant after: Zhongke Nanjing artificial intelligence Innovation Research Institute

Address before: Room 203b, building 3, artificial intelligence Industrial Park, 266 Chuangyan Road, Qilin science and Technology Innovation Park, Jiangning District, Nanjing City, Jiangsu Province, 211000

Applicant before: Zhongke Fangcun Zhiwei (Nanjing) Technology Co.,Ltd.

Applicant before: NANJING ARTIFICIAL INTELLIGENCE CHIP INNOVATION INSTITUTE, INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant