CN110971837B

CN110971837B - ConvNet-based dim light image processing method and terminal equipment

Info

Publication number: CN110971837B
Application number: CN201811160234.7A
Authority: CN
Inventors: 廖秋萍
Original assignee: TCL Technology Group Co Ltd
Current assignee: TCL Technology Group Co Ltd
Priority date: 2018-09-30
Filing date: 2018-09-30
Publication date: 2021-07-27
Anticipated expiration: 2038-09-30
Also published as: CN110971837A

Abstract

The invention relates to the technical field of image processing, and provides a dim light image processing method based on ConvNet and terminal equipment. The method comprises the following steps: preprocessing image original data; inputting the preprocessed original data into a convolutional neural network model for feature enhancement; and performing channel rearrangement processing on the output data of the convolution network model to generate an image corresponding to the original data. The invention can reduce the data volume processed by the convolutional neural network model on the premise of ensuring the imaging definition, reduce the power consumption and improve the imaging speed.

Description

ConvNet-based dim light image processing method and terminal equipment

Technical Field

The invention relates to the technical field of image processing, in particular to a dim light image processing method based on ConvNet and terminal equipment.

Background

Fast sharp imaging based on monocular single images is a very challenging task in dim light conditions. Commonly used solutions are physical solutions and image processing solutions. The physical solutions include opening the aperture, increasing the exposure time, using high sensitivity, turning on the flash lamp, etc.; and the image processing methods include a conventional multi-step based method, a multi-frame based method, and an end-to-end based deep learning method. However, these current methods have various disadvantages, such as increased exposure time, which can cause blur due to jitter; the traditional multi-step method is complicated in process and poor in effect; the method based on multi-frames has the problem of difficult matching under the dark light condition; although the end-to-end processing method based on deep learning has a good imaging effect, the method is high in power consumption and long in operation time, and cannot be used at a mobile end.

Disclosure of Invention

In view of this, embodiments of the present invention provide a dark light image processing method and a terminal device based on ConvNets, so as to solve the problems of high power consumption and long operation time of the current end-to-end image processing method based on deep learning.

The first aspect of the embodiments of the present invention provides a dark light image processing method based on ConvNets, including:

preprocessing image original data;

inputting the preprocessed original data into a convolutional neural network model for feature enhancement;

and performing channel rearrangement processing on the output data of the convolution network model to generate an image corresponding to the original data.

A second aspect of an embodiment of the present invention provides a dark light image processing apparatus based on ConvNets, including:

the preprocessing unit is used for preprocessing the original image data;

the processing unit is used for inputting the preprocessed original data into a convolutional neural network model for feature enhancement;

and the generating unit is used for carrying out channel rearrangement processing on the output data of the convolution network model to generate an image corresponding to the original data.

A third aspect of the embodiments of the present invention provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the ConvNets-based dim-light image processing method in the first aspect when executing the computer program.

A fourth aspect of the embodiments of the present invention provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the ConvNets-based dim-light image processing method of the first aspect.

Compared with the prior art, the embodiment of the invention has the following beneficial effects: the data is subjected to feature enhancement through the convolutional neural network model, clear imaging under the dim light condition can be realized, the data volume processed by the convolutional neural network model is reduced on the premise of ensuring the imaging definition, the power consumption is reduced, and the imaging speed is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a flowchart of an implementation of a dark light image processing method based on ConvNets according to an embodiment of the present invention;

fig. 2 is a flowchart illustrating an implementation of preprocessing raw data in a ConvNets-based dim-light image processing method according to an embodiment of the present invention;

fig. 3 is a flowchart illustrating an implementation of inputting preprocessed raw data into a convolutional neural network model for feature enhancement in a conv Nets-based dim light image processing method according to an embodiment of the present invention;

fig. 4 is a flowchart illustrating an implementation of feature enhancement on feature-coded data in a ConvNets-based dim light image processing method according to an embodiment of the present invention;

fig. 5 is a schematic flow chart of a dark light image processing method based on ConvNets according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a conventional end-to-end convolutional neural network model for imaging under dim light conditions, provided by an embodiment of the present invention;

fig. 7 is a schematic diagram of a convolutional neural network model in a ConvNets-based dim light image processing method according to an embodiment of the present invention;

FIG. 8 is an example of an image of raw data output by a camera under dim light conditions provided by embodiments of the present invention;

FIG. 9 is an example of an image after brightness linear amplification is performed on original data according to an embodiment of the present invention;

fig. 10 is an example of an image obtained by a ConvNets-based dim-light image processing method according to an embodiment of the present invention;

fig. 11 is a schematic diagram of a ConvNets-based dim light image processing apparatus according to an embodiment of the present invention;

fig. 12 is a schematic diagram of a terminal device according to an embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

Fig. 1 is a flowchart of an implementation of a ConvNets-based dim-light image processing method according to an embodiment of the present invention, which is detailed as follows:

in S101, the image raw data is preprocessed.

In this embodiment, raw data sensed by the image sensor may be acquired. The raw data sensed by the image sensor refers to raw data sensed by a photoelectric conversion device in the image sensor. The raw data may be pre-processed to correct for image sensor bias and adjust image brightness before being input to the convolutional neural network.

As an embodiment of the present invention, as shown in fig. 2, S101 may include:

in S201, rearranging the original data to obtain a preset number of channel data; the data of the same channel corresponds to the same color.

In this embodiment, the original data includes data corresponding to a plurality of image colors, and data of a preset number of channels can be obtained by rearranging the original data, and the data of the same channel corresponds to the same color. For example, the preset number is four, and the four channels may be a red channel corresponding to red, a first green channel corresponding to green, a blue channel corresponding to blue, and a second green channel corresponding to green.

In S202, the data of each channel is subjected to the black level removal processing, respectively.

In this embodiment, the black level correction can be realized by performing the black level removal processing on the data of each channel, so that the deviation of the image sensor can be corrected. For example, a preset black level value may be subtracted from the data of each channel.

In S203, the data of each channel subjected to the black level removal processing is multiplied by a preset amplification factor to perform amplification processing, respectively.

In this embodiment, the ISO brightness adjustment function can be performed by multiplying the data of each channel by a preset magnification. Here, ISO is the meaning of sensitivity, and is also an abbreviation of International Standardization Organization (International Standardization Organization), and it is this Organization that makes quantification regulations on sensitivity.

In S102, the preprocessed raw data is input to a convolutional neural network model for feature enhancement.

In this embodiment, feature enhancement may be performed on the preprocessed raw data through a convolutional neural network model, so as to improve the imaging speed under the condition of ensuring the image definition under the dim light condition.

As an embodiment of the present invention, as shown in fig. 3, S102 may include:

in S301, data input to the convolutional neural network model is feature-encoded.

In S302, feature enhancement is performed on the feature-coded data.

In S303, the data encoded by the features and the data enhanced by the features are decoded and fused.

In this embodiment, feature encoding may be performed on the preprocessed original data, then feature enhancement may be performed on the feature-encoded data, and finally, the feature-encoded data and the feature-enhanced data may be decoded and fused, so that feature enhancement on the data may be implemented through a convolutional neural network. As an embodiment of the present invention, S303 may include:

decoding and fusing the feature encoded data and the feature enhanced data through a convolution layer, a deconvolution layer, and a feature fusion layer of a depth separable convolution structure.

In this embodiment, the convolutional layer and the deconvolution layer of the inverse depth separable convolutional structure are used to decode the data. The characteristic fusion layer is used for carrying out characteristic fusion on the data subjected to characteristic coding and the corresponding data in the data subjected to characteristic enhancement.

As an embodiment of the present invention, as shown in fig. 4, S302 may include:

in S401, convolving the feature-coded data by a dot product convolution with a convolution kernel number of 4 × N to obtain first data with a channel number of 4 × N; and the channel number of the data subjected to the feature coding is N, wherein N is an integer greater than 1.

In S402, feature extraction is performed on the first data with the number of channels being 4 × N by deep convolution, so as to obtain second data with the number of channels being 4 × N.

In S403, channel feature fusion is performed on the second data with the number of channels of 4 × N by dot product convolution with the number of convolution kernels of N.

In this embodiment, the number of channels of the feature-coded data is N, and the feature-coded data is convolved by the dot product convolution with the number of convolution kernels of 4 × N to obtain first data with the number of channels of 4 × N; then, carrying out feature extraction on the first data with the channel number of 4 × N through deep convolution to obtain second data with the channel number of 4 × N; and finally, performing channel feature fusion on the second data with the channel number of 4 × N through the dot product convolution with the convolution kernel number of N, thereby realizing feature enhancement on the feature-coded data.

In S103, a channel rearrangement process is performed on the output data of the convolutional network model to generate an image corresponding to the original data.

According to the embodiment of the invention, the data is subjected to characteristic enhancement through the convolutional neural network model, clear imaging under a dim light condition can be realized, the data volume processed by the convolutional neural network model is reduced on the premise of ensuring the imaging definition, the power consumption is reduced, and the imaging speed is improved.

Optionally, S103 may include performing channel rearrangement on the data of each channel output by the convolutional network model to obtain data of the first color channel, the second color channel, and the third color channel.

In this embodiment, there are three color channels: a first color channel, a second color channel, and a third color channel, each color channel corresponding to an image color. For example, if the number of channels of the convolution network model output data is 12, the 1 st, 4 th, 7 th, and 10 th channels may be rearranged into the R channel, the 2 nd, 5 th, 8 th, and 11 th channels may be rearranged into the G channel, and the 3 rd, 6 th, 9 th, and 12 th channels may be rearranged into the B channel. The channel rearrangement formula is as follows:

in () In the formula represents the output data of the convolutional network model, and Out () represents the final RGB image data. Assuming that In (. is) is w wide and h high, i ranges from 1,2, …, w, j ranges from 1,2, …, h, Out (. is.) is 2 w wide and 2 h high. k takes the value of 1,2 or 3, 1 represents an R channel, 2 represents a G channel, and 3 represents a B channel. If k is 1, the line 1 in the formula represents the numerical values of the positions of the line i and the column j in the channel 1(k) output by the convolution network model, and the numerical values are assigned to the line i × 2 and the column j × 2 of the R channel in the RGB image data; the 2 nd line of the formula represents the numerical values of the position of the ith line and the jth column in the 5 th (k +4) th channel output by the convolution network model, and the numerical values are assigned to the ith 2 nd line and the jth 2+1 th column of the R channel in the RGB image data; the line 1 of the formula represents the numerical values of the positions of the line i and the column j in the channel 8(k +7) output by the convolution network model, and the numerical values are assigned to the line i.times.2 +1 and the column j.times.2 of the R channel in the RGB image data; the line 1 of the formula represents the numerical values of the positions of the line i and the column j in the 11(k +10) th channel output by the convolution network model, and the numerical values are assigned to the line i × 2+1 and the column j × 2+1 of the R channel in the RGB image data. And when k takes values of 2 and 3, the analogy is repeated.

As an embodiment of the invention, the convolutional neural network model comprises a feature coding module, a feature enhancement module and a feature decoding and fusing module. The feature coding module is used for carrying out feature coding on the data input into the convolutional neural network model. The characteristic enhancement module is used for carrying out characteristic enhancement on the data output by the characteristic coding module. The feature decoding and fusing module is used for decoding and fusing the data output by the feature enhancing module and the data output by the feature coding module.

Compared with the traditional end-to-end convolutional neural network model for imaging under the dim light condition, the embodiment provides an improved convolutional neural network model, which comprises a feature enhancement module, wherein the feature enhancement module is used for performing feature enhancement on data input to the feature enhancement module so as to ensure the imaging definition and the imaging speed under the dim light condition.

In this embodiment, the convolutional neural network model includes a feature encoding module, a feature enhancing module, and a feature decoding and fusing module. After the original data is preprocessed, data of a plurality of channels can be obtained, after the data of each channel is input into a convolutional neural network model, a feature coding module firstly carries out feature coding on the data, the number of the channels of the data is increased, and the data of each channel after the number of the channels is increased is output to a feature enhancing module and a feature decoding and fusing module. For example, the number of channels can be raised from 4 to 64 by multilayer convolutional layers.

The characteristic enhancement module is used for carrying out characteristic enhancement on the data and outputting the data after the characteristic enhancement to the characteristic coding and fusion module. The number of channels of the input data of the feature enhancement module is consistent with the number of channels of the output data, for example, the number of channels of the input data of the feature enhancement module is 64.

The feature decoding and fusing module is used for performing feature decoding on the output data of the feature enhancement module and performing feature fusion on the output data of the feature enhancement module and the corresponding data output by the feature coding module. The data can be feature decoded by multiple layers of deconvolution layer joint convolution layers, and feature fusion can be performed by series feature operation. In the feature decoding process, the number of channels of the data is reduced, namely the number of channels of the data output by the feature decoding and fusing module is smaller than the number of channels of the input data. For example, the number of channels of the input data of the feature decoding and merging module is 64, and the number of channels of the output data is reduced to 12.

Optionally, the feature encoding module includes a convolution layer with a step size of 2 and does not include a max pooling layer.

The convolution layer having a step size of 2 is a convolution layer that is convolved with convolution kernel data having a step size of 2. Compared with the traditional end-to-end convolutional neural network model for imaging under the dim light condition, the embodiment can reduce the data calculation amount and the required memory of the convolutional neural network model by adopting convolution with the step size of 2 in the feature coding module to replace convolution with the step size of 1 and the maximum pooling layer with the step size of 2 in the traditional model.

Optionally, the feature decoding and fusion module comprises a convolution layer, a deconvolution layer, and a feature fusion layer of an inverse depth separable convolution structure.

Wherein the convolutional layer and the deconvolution layer of the inverse depth separable convolutional structure are used to decode the data. The characteristic fusion layer is used for carrying out characteristic fusion on the data output by the characteristic coding module and the corresponding data output by the characteristic enhancement module, and carrying out characteristic fusion on the data output by the characteristic coding module and the corresponding data in the processing process of the characteristic decoding and fusion module. Compared with the traditional end-to-end convolutional neural network model for imaging under the dim light condition, the embodiment can further reduce the data calculation amount of the convolutional neural network model by adopting a depthwise separable convolutional structure (depthwise partial convolution) in the feature coding part in the feature decoding and fusing module to replace a convolutional layer in the traditional model, thereby improving the operation speed.

As an embodiment of the present invention, the feature enhancement module includes at least one layer, each layer including a first sublayer, a second sublayer, and a third sublayer; the channel number of the input data of the characteristic enhancement module is N, wherein N is an integer greater than 1. The first sublayer convolves the input data by dot product convolution with a convolution kernel number of 4 × N, and increases the number of channels to 4 × N. And the second sublayer performs feature extraction on the data of 4 × N channels output by the first sublayer through depth convolution. And the third sublayer performs channel feature fusion on the data of 4 × N channels output by the second sublayer through the dot product convolution with the convolution kernel number of N.

In this embodiment, the feature enhancement module may include convolutional layers of a predetermined number of layers of inverse depth separable convolutional structures. For example, the feature enhancement module may include, but is not limited to, 5 convolutional layers of inverse depth separable convolutional structures, each convolutional layer of inverse depth separable convolutional structures comprising three sublayers: a first sublayer, a second sublayer, and a third sublayer. The 5 convolutional layers with reverse depth separable convolutional structures sequentially perform characteristic enhancement on the data, and the output data of the previous convolutional layer is used as the input data of the next convolutional layer. The following description will take three sub-layers in one of the convolutional layers of the inverse depth separable convolutional structure as an example.

The three sublayers process the data in sequence. After the data with the number of channels N is input to the convolutional layer, the data with the number of channels N is convolved by the first sublayer by adopting a PW (point convolution) with the number of convolution kernels of 4 × N, the number of channels of the data is increased from N to 4 × N, and the data with the number of channels increased is output to the second sublayer. For example, the number of channels of the input data of the first sublayer is 64, the data is convolved by 256 convolution kernels of 1 × 1, and the number of channels of the output data of the first sublayer is 256. The first sublayer promotes the number of channels of data through dot product convolution, can obtain richer features, and improves the feature denoising capability and the color recovery capability.

The second sublayer adopts a depth convolution (DW) to respectively extract the features of the data of 4 × N channels, and outputs the data to the third sublayer. For example, the number of channels of the input data of the second sublayer is 256, the second sublayer convolves the data by 256 convolution kernels of 3 × 3, and the number of channels of the output data of the second sublayer is 256.

And the third sublayer performs channel feature fusion on the data of 4 × N channels output by the second sublayer by adopting dot product convolution with the convolution kernel number of N. For example, the number of channels of the input data of the third sublayer is 256, the data is convolved by 64 convolution kernels of 1 × 1, and the number of channels of the output data of the third sublayer is 64. And recovering the channel number by the third sublayer through dot product convolution and performing channel feature fusion. The second sublayer and the third sublayer together accomplish efficient feature processing.

In the embodiment, the first sublayer, the second sublayer and the third sublayer of the inverse depth separable convolution structure can improve the characteristic denoising capability and the color recovery capability, perform efficient characteristic processing on characteristics, and further improve the performance by serially connecting the convolution layers of the multilayer inverse depth separable convolution structure, thereby ensuring the imaging definition of the convolution neural network model on imaging under the dark light condition.

Fig. 5 is a schematic flow chart according to an embodiment of the present invention. The wire frame S1 is the schematic diagram of S101 in fig. 1, where the length of the raw data sensed by the image sensor is H, the width is W, and the number of channels is 1; the length of the data obtained by channel rearrangement of the original data is H/2, the width is W/2, and the number of channels is 4. Within the wire frame S2 is a schematic diagram of S102 in fig. 1. Within the wire frame S3 is a schematic diagram of S103 in fig. 1.

Fig. 6 is a schematic diagram of a conventional end-to-end convolutional neural network model for imaging under dim light conditions according to an embodiment of the present invention. The data processing diagram of the feature coding module in the conventional model is shown in the wire frame 61, and the input of the module is the preprocessed raw data. Each row of the module is a sublayer of 4, with 3 small layers for each sublayer. The small layers are processed by convolution with convolution kernel size of 3X3 and step size of 1, and the sub-layers are processed by maximum pooling with step size of 2. After the maximum pooling processing with the sub-layer step size of 2, the length and width of the output data are reduced by half, and after 4 sub-layer processing, the length and width of the final output data of the module are 1/8 of the length and width of the input data of the module. In the wire frame 62, a data processing schematic diagram of a feature extraction module in a conventional model is shown, and the input of the module is data after the output data of the feature coding module is processed by convolution with a convolution kernel size of 3 × 3 and a step size of 2, so that the data length and width of the module are 1/16 of the data length and width of the input data of the feature coding module. The module has only one layer, which has 3 small layers, and the small layers are processed by convolution with convolution kernel size of 3X3 and step size of 1. In the wire frame 63, a data processing diagram of a feature decoding and fusion module in the conventional model is shown, and the input of the module is data obtained by performing upsampling processing with the step length of 2 on the output of the feature extraction module. After the up-sampling operation with step size of 2, the length and width of the output data will be increased by 1 times than the input data, so the length and width of the input data of the module will be restored to 1/8 of the length and width of the input data of the coding module. Each row of the module is a sublayer, 4 sublayers in total, and each sublayer has 3 small layers. The module performs two functions, fusion and decoding. The fusion function is the first sublayer in each sublayer, and the sublayer completes the fusion function through output data in the corresponding sublayer of the serial characteristic coding module and output data after deconvolution processing of the bottom sublayer. The small layers are processed by convolution with convolution kernel size of 3X3 and step size of 1, and the small layers are processed by deconvolution with convolution kernel size of 2X2 and step size of 2. After deconvolution processing with the step length of 2, the length and width of output data can be increased by 1 time, and after 4 sub-layer processing, the length and width of input data of the module are 8 times of those of the input data. And through the decoding process, the length and width of the output data of the decoding module are restored to be consistent with the length and width of the input data of the encoding module. The numbers in the line box indicate the number of channels of convolutional layer processing data.

Fig. 7 is a schematic diagram of an end-to-end convolutional neural network model according to an embodiment of the present invention. The data processing diagram of the feature coding module is in the wire frame 71, and the input data of the module is the preprocessed raw data. Each row of the module is a sublayer, 4 sublayers in total, and the sub-layer 1 has 2 small layers. The small layers are processed by convolution with convolution kernel size of 3X3 and step size of 1, and the sub-layers are processed by convolution with convolution kernel size of 3X3 and step size of 2. After convolution processing is carried out through convolution with the sub-layer step size of 2, the length and the width of output data are reduced by half, and after 4 sub-layer processing, the length and the width of the final output data of the module are 1/8 of the length and the width of input data. The data processing diagram of the feature enhancement module is shown in the wire frame 72, and the input of the module is the data processed by the convolution with the convolution kernel size of 3X3 and the step size of 2, so that the length and width of the output data of the module are 1/16 of the length and width of the input data of the feature encoding module. The module is formed by connecting 5 layers of reverse depth separable convolutions in series, and the length and width of input data and output data of the module are kept unchanged. The specific structure and data processing procedure of the feature enhancement module have been described above, and reference may be made to the above contents, which are not described herein again. The data processing diagram of the feature decoding and fusion module is in the wire frame 73, and the input of the module is the data obtained by performing upsampling processing on the output data of the feature enhancement module with the step length of 2. After the up-sampling operation with step size 2, the length and width of the output data will increase by 1 times, so the length and width of the input data of the module will be restored to 1/8 of the length and width of the input data of the coding module. The module contains 4 sublayers, each sublayer having 2 sublayers. The module performs two functions, fusion and decoding. The fusion function is the first sublayer in each sublayer, and the sublayer completes the fusion function through output data of the corresponding sublayer of the serial characteristic coding module and output data of the bottom sublayer after deconvolution processing. From the bottom to the top, the small layers of the 1 st and 2 nd sub-layers are processed by convolution with convolution kernel size of 3X3 and step size of 1, and the small layers of the 3 rd and 4 th sub-layers are processed by depth separable convolution; the sub-layers are processed by deconvolution with a kernel size of 2X2 and a step size of 2. After the deconvolution processing with the step length of 2, the length and width of the output data can be increased by 1 time, after the 4-sublayer processing, the length and width of the output data of the module are 8 times of the length and width of the input data, and after the decoding process, the length and width of the output data of the feature decoding and fusing module are consistent with the length and width of the input data of the feature coding module. The numbers in the line box indicate the number of channels of convolutional layer processing data.

By comparing fig. 6 and fig. 7, it can be seen that the convolutional neural network model proposed in the present embodiment is improved as follows compared with the conventional neural network model: 1) part of the convolution layer is deleted; 2) the number of convolution kernels is reduced; 3) convolution with the step size of 2 is adopted to replace convolution with the step size of 1 and the maximum pooling layer with the step size of 2; 4) further reducing the calculation amount by adopting a depth separable convolution structure (depthwise partial convolution); 5) the traditional second feature extraction module is replaced by the feature enhancement module, and the feature enhancement module capable of efficiently extracting features is adopted to reduce the precision loss caused by the structural changes of 1) to 4) and keep the enhancement restoration capability. Through the improvement, the embodiment realizes the quick and clear imaging method under the dim light environment with low power consumption and high speed, can be applied to mobile devices such as mobile phones, solves the problem that the mobile devices are difficult to carry out quick and clear imaging under the dim light environment in real time, improves the imaging speed of the mobile devices under the dim light environment, and improves the user experience.

Fig. 8 to 10 are graphs showing experimental verification effects provided by the embodiment of the present invention. FIG. 8 is the raw data from the camera under dim light conditions; FIG. 9 is an image of raw data after linear brightness amplification; fig. 10 is an image obtained by the image processing method according to the embodiment of the present invention. Table 1 shows the memory and elapsed time results for the input images 512 x 4 when the experimental verification was performed. If the whole image is processed, the total running time is 1705ms (including the image splicing time), and the memory occupation is 144.2 MB.

Table 1 experimental verification data sheet

As can be seen from fig. 8 to 10, the image processing method provided by the present embodiment can generate a clear image under dark light conditions. The combination of table 1 and fig. 8 to 10 shows that the image processing method provided by the present embodiment has low power consumption, high speed, and clear generated image.

According to the embodiment of the invention, the feature enhancement module for enhancing the features of the channel data is arranged in the convolutional neural network model, so that clear imaging of the convolutional layer and the convolutional neural network model with less channels can be realized under the dim light condition, the data volume processed by the convolutional neural network model is reduced on the premise of ensuring the imaging precision, the power consumption is reduced, and the imaging speed is increased.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

Corresponding to the above-described dark-light image processing method based on ConvNets, fig. 11 shows a schematic diagram of a dark-light image processing apparatus based on ConvNets according to an embodiment of the present invention. For convenience of explanation, only the portions related to the present embodiment are shown.

Referring to fig. 11, the apparatus includes a preprocessing unit 111, a processing unit 112, and a generating unit 113.

A preprocessing unit 111 for preprocessing the image raw data;

the processing unit 112 is configured to input the preprocessed raw data into a convolutional neural network model for feature enhancement;

and a generating unit 113, configured to perform channel rearrangement processing on the output data of the convolutional network model, and generate an image corresponding to the original data.

Optionally, the preprocessing unit 111 is configured to:

rearranging the original data to obtain channel data with a preset number; the data of the same channel corresponds to the same color;

respectively carrying out black level removing processing on the data of each channel;

and multiplying the data of each channel subjected to the black level removal processing by a preset amplification factor for amplification processing.

Optionally, the convolutional neural network model further includes a feature coding module, a feature enhancing module, and a feature decoding and fusing module;

the characteristic coding module is used for carrying out characteristic coding on the data input into the convolutional neural network model;

the characteristic enhancement module is used for enhancing the characteristics of the data output by the characteristic coding module;

the feature decoding and fusing module is used for decoding and fusing the data output by the feature enhancing module and the data output by the feature coding module.

Optionally, the feature decoding and fusion module includes a convolution layer, a deconvolution layer, and a feature fusion layer of a depth separable convolution structure.

Optionally, the feature enhancement module comprises at least one layer, each layer comprising a first sublayer, a second sublayer and a third sublayer; the number of channels of input data of the feature enhancement module is N, wherein N is an integer greater than 1;

the first sublayer convolves the input data through dot product convolution with the convolution kernel number of 4 × N, and increases the channel number to 4 × N;

the second sublayer performs feature extraction on the data of 4 × N channels output by the first sublayer through depth convolution;

and the third sublayer performs channel feature fusion on the data of 4 × N channels output by the second sublayer through the dot product convolution with the convolution kernel number of N.

Optionally, the generating unit 113 is configured to:

and performing channel rearrangement on the data of each channel output by the convolution network model to obtain data of a first color channel, a second color channel and a third color channel.

According to the embodiment of the invention, the characteristic enhancement module for enhancing the characteristics of the data is arranged in the convolutional neural network model, so that the convolutional neural network model with a small number of convolutional layers and channels can realize clear imaging under the dark light condition, the data volume processed by the convolutional neural network model is reduced on the premise of ensuring the imaging definition, the power consumption is reduced, and the imaging speed is increased.

Fig. 12 is a schematic diagram of a terminal device according to an embodiment of the present invention. As shown in fig. 12, the terminal device 12 of this embodiment includes: a processor 120, a memory 121, and a computer program 122, such as a program, stored in the memory 121 and executable on the processor 120. The processor 120, when executing the computer program 122, implements the steps in the various method embodiments described above, such as the steps 101 to 103 shown in fig. 1. Alternatively, the processor 120, when executing the computer program 122, implements the functions of each module/unit in each device embodiment described above, for example, the functions of the units 111 to 113 shown in fig. 11.

Illustratively, the computer program 122 may be partitioned into one or more modules/units that are stored in the memory 121 and executed by the processor 120 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 122 in the terminal device 12.

The terminal device 12 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor 120, a memory 121. Those skilled in the art will appreciate that fig. 12 is merely an example of a terminal device 12 and does not constitute a limitation of terminal device 12 and may include more or fewer components than shown, or some components may be combined, or different components, for example, the terminal device may also include input output devices, network access devices, buses, displays, etc.

The Processor 120 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 121 may be an internal storage unit of the terminal device 12, such as a hard disk or a memory of the terminal device 12. The memory 121 may also be an external storage device of the terminal device 12, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 12. Further, the memory 121 may also include both an internal storage unit and an external storage device of the terminal device 12. The memory 121 is used to store the computer program and other programs and data required by the terminal device. The memory 121 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media which may not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A dark light image processing method based on ConvNet is characterized by comprising the following steps:

preprocessing image original data;

performing channel rearrangement processing on the output data of the convolutional neural network model to generate an image corresponding to the original data;

wherein, the original image data is preprocessed to obtain data of a plurality of channels, the data of each channel is input into the convolution neural network model, the data is subjected to feature coding through a feature coding module, the number of channels of the data output by the feature coding module is larger than the number of channels of the data input into the feature coding module, and the data of each channel output by the feature coding module is input into a feature enhancement module, the data is subjected to characteristic enhancement through the characteristic enhancement module, the data after the characteristic enhancement is input into the characteristic coding and fusion module, the feature encoding and fusing module is used for carrying out feature decoding on the data after the feature enhancement and carrying out feature fusion on the output data of the feature enhancement module and the corresponding data output by the feature encoding module, the number of channels of the data output by the feature decoding and fusing module is less than the number of channels of the data input into the feature decoding and fusing module; the convolutional neural network model comprises a feature coding module, a feature enhancing module and a feature decoding and fusing module.

2. The ConvNets-based dim light image processing method according to claim 1, wherein said preprocessing said raw data comprises:

3. The ConvNets-based dim light image processing method according to claim 1, wherein said inputting said preprocessed raw data into a convolutional neural network model for feature enhancement comprises;

performing feature coding on data input into the convolutional neural network model;

performing feature enhancement on the data subjected to feature coding;

and decoding and fusing the data subjected to the feature coding and the data subjected to the feature enhancement.

4. The ConvNets-based dim light image processing method according to claim 3, wherein said decoding and fusing said feature encoded data and said feature enhanced data comprises:

5. The ConvNets-based dim light image processing method according to claim 3, wherein said feature enhancing the feature encoded data comprises:

performing convolution on the data subjected to the feature coding through dot product convolution with the convolution kernel number of 4 × N to obtain first data with the channel number of 4 × N; the channel number of the data subjected to the feature coding is N, wherein N is an integer greater than 1;

performing feature extraction on the first data with the channel number of 4 × N through deep convolution to obtain second data with the channel number of 4 × N;

and performing channel feature fusion on the second data with the channel number of 4 × N through the dot product convolution with the convolution kernel number of N.

6. The ConvNets-based dim light image processing method according to any one of claims 1 to 5, wherein said performing channel rearrangement on the output data of said convolutional neural network model to generate an image corresponding to said raw data comprises:

and performing channel rearrangement on the data of each channel output by the convolutional neural network model to obtain data of a first color channel, a second color channel and a third color channel.

7. A ConvNets-based dim light image processing apparatus, comprising:

the preprocessing unit is used for preprocessing the original image data;

the generating unit is used for carrying out channel rearrangement processing on the output data of the convolutional neural network model to generate an image corresponding to the original data;

8. The ConvNets-based dim light image processing apparatus according to claim 7, wherein said convolutional neural network model comprises a feature encoding module, a feature enhancement module and a feature decoding and fusing module;

9. The ConvNets-based dim light image processing apparatus according to claim 8, wherein said feature encoding module comprises convolution layer with step size of 2 and does not comprise max pooling layer.

10. The ConvNets-based dim light image processing apparatus according to claim 8, wherein said feature decoding and fusing module comprises a convolution layer, a deconvolution layer and a feature fusion layer of a depth separable convolution structure.

11. The ConvNets-based dim light image processing apparatus according to any one of claims 8 to 10, wherein the feature enhancing module comprises at least one layer, each layer comprising a first sublayer, a second sublayer and a third sublayer; the number of channels of input data of the feature enhancement module is N, wherein N is an integer greater than 1;

12. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 6 when executing the computer program.

13. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.