CN115984133A - Image enhancement method, vehicle snapshot method, device and medium - Google Patents

Image enhancement method, vehicle snapshot method, device and medium Download PDF

Info

Publication number
CN115984133A
CN115984133A CN202211709136.0A CN202211709136A CN115984133A CN 115984133 A CN115984133 A CN 115984133A CN 202211709136 A CN202211709136 A CN 202211709136A CN 115984133 A CN115984133 A CN 115984133A
Authority
CN
China
Prior art keywords
image
network
student
teacher
student network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211709136.0A
Other languages
Chinese (zh)
Inventor
吴家新
王诗韵
满志朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Keda Technology Co Ltd
Original Assignee
Suzhou Keda Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Keda Technology Co Ltd filed Critical Suzhou Keda Technology Co Ltd
Priority to CN202211709136.0A priority Critical patent/CN115984133A/en
Publication of CN115984133A publication Critical patent/CN115984133A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to an image enhancement method, a vehicle snapshot method, equipment and a medium, belonging to the technical field of image processing, wherein the method comprises the following steps: preprocessing original image data acquired by an image acquisition assembly; inputting the preprocessed image data into an image enhancement model to obtain image data with enhanced brightness; by performing downsampling processing on the image input to the student network, the computing efficiency of the image enhancement network can be improved. The images input to the teacher network are not subjected to down-sampling processing, so that the performance of image enhancement of the teacher network can be guaranteed, the computing speed of the student network can be guaranteed, the network complexity of the student network is lower than that of the teacher network, and the teacher network distills the learned knowledge to the student network, so that the image enhancement performance of the student network can be guaranteed under the condition that the computing speed of the student network is guaranteed. And the middle characteristic diagram output by the teacher network is shared to the student network, so that the training efficiency of the student network can be improved.

Description

Image enhancement method, vehicle snapshot method, device and medium
Technical Field
The application relates to an image enhancement method, a vehicle snapshot method, equipment and a medium, and belongs to the technical field of image processing.
Background
Object detection based on deep learning has been applied in many real-world scenarios, such as pedestrian recognition tasks, automated driving techniques, image segmentation tasks, and so on. However, the target detection model can only perform target detection on an image captured when the brightness of the captured scene meets a preset requirement, and in this case, brightness enhancement is required for an image captured in a low-illumination scene.
The traditional brightness enhancement mode is to use a low-illumination image enhancement network established based on a deep neural network to enhance the brightness of a low-illumination image so as to obtain an enhanced illumination image.
However, in order to improve the brightness enhancement performance of the low-illumination image enhancement network, the complexity of the model of the low-illumination image enhancement network is generally high, which may cause the problem of low computational efficiency of the model, and thus low computational efficiency and large delay of image enhancement.
Disclosure of Invention
The application provides an image enhancement method, a vehicle snapshot method, equipment and a medium, on one hand, the down-sampling processing is carried out on the image input to a student network, and the computing efficiency of the student network can be improved. On the other hand, the image input to the teacher network is not subjected to down-sampling treatment, only the down-sampled image is input to the student network, so that the performance of image enhancement of the teacher network and the computing speed of the student network can be ensured, meanwhile, the network complexity of the student network is lower than that of the teacher network, and then the teacher network distills the learned knowledge to the student network, so that the image enhancement performance can be ensured under the condition that the computing speed of the student network is ensured, and thus, the image enhancement efficiency and the image enhancement effect can be improved in the image enhancement process. Meanwhile, in the knowledge distillation process, the intermediate characteristic diagram output by the teacher network is shared to the network layer corresponding to the student network, so that the student network uses the intermediate characteristic diagram to accelerate convergence, and the training efficiency of the student network is improved. The application provides the following technical scheme:
in a first aspect, a method for enhancing an image is provided, the method comprising:
acquiring original image data acquired by an image acquisition assembly;
preprocessing the original image data to obtain preprocessed image data, wherein the preprocessed image data is matched with an input layer of an image enhancement model;
inputting the preprocessed image data into the image enhancement model to obtain image data with enhanced brightness, wherein the image enhancement model is obtained by distilling knowledge of a teacher network to a student network, the image input to the student network in the knowledge distillation process is obtained by down-sampling the image input to the teacher network, and an intermediate characteristic diagram output by the teacher network in the knowledge distillation process can be shared to a network layer corresponding to the student network; and the student network is trained by using the intermediate feature map and the down-sampled sample image, and the network complexity of the student network is lower than that of the teacher network.
Optionally, the training process of the image enhancement model includes:
acquiring training data, wherein the training data comprises the sample image and a label image corresponding to the sample image, and the brightness of the label image is greater than that of the sample image;
preprocessing the sample image to enable the preprocessed sample image to meet the input requirement of a teacher network;
inputting the preprocessed sample image into the teacher network to obtain a soft label image and an intermediate characteristic diagram;
sharing the intermediate characteristic graph to a network layer corresponding to a student network to obtain a soft tag predicted image and a first hard tag predicted image output by the student network;
training the teacher network based on differences between the soft label images and the soft label predicted images;
performing down-sampling on the preprocessed sample image to obtain a down-sampled sample image;
inputting the down-sampled sample image into the student network to obtain a second hard tag prediction image;
training the student network based on the difference between the first hard-label predicted image and the label image, the difference between the second hard-label predicted image and the label image, and the difference between the soft-label image and the soft-label predicted image to obtain the image enhancement model.
Optionally, the teacher network includes at least two first feature extraction layers and an output layer which are connected in sequence, and the student network includes at least two second feature extraction layers and an output layer which are connected in sequence; the number of the second feature extraction layers is equal to that of the first feature extraction layers, the model complexity of the second feature extraction layers is lower than that of the first feature extraction layers, and the dimension of the feature graph output by each layer of the first feature extraction layers is the same as that of the feature graph output by the corresponding layer of the second feature extraction layers; inputting the preprocessed sample image into a teacher network to obtain a soft label image and an intermediate characteristic diagram, wherein the soft label image and the intermediate characteristic diagram comprise:
inputting the preprocessed sample image into a teacher network to obtain a middle feature map output by a first feature extraction layer at the ith layer of the teacher network and a soft label image output by an output layer of the teacher network; wherein i is a positive integer;
correspondingly, the sharing the intermediate feature map to a corresponding network layer of a student network to obtain a soft-tag predicted image and a first hard-tag predicted image output by the student network includes:
and taking the intermediate feature map as an output result of an ith layer of second feature extraction layer in the student network to input an i +1 th layer of second feature extraction layer to obtain the soft tag predicted image and the first hard tag predicted image.
Optionally, the training the student network based on the difference between the first hard-tagged predicted image and the tag image, the difference between the second hard-tagged predicted image and the tag image, and the difference between the soft-tagged image and the soft-tagged predicted image to obtain the image enhancement model includes:
inputting the soft label image and the soft label prediction image into a first loss function to obtain a teacher loss value;
inputting the label image and the first hard label predicted image into a second loss function to obtain a first loss value;
inputting the tag image and the second hard tag predicted image into the second loss function to obtain a second loss value;
determining a weighted sum of the first loss value and the second loss value to obtain a student loss value;
updating network parameters in the student network based on the teacher loss value and the student loss values to train the student network.
Optionally, the sample image is in bayer format, and the preprocessing the sample image includes:
extracting a region of interest in the sample image data;
performing channel separation on the region of interest to obtain a separated sample image;
and adjusting the brightness of the separated sample image based on a preset brightness adjusting parameter to obtain the preprocessed sample image.
Optionally, the preprocessing the raw image data includes:
extracting a region of interest in the raw image data;
performing down-sampling processing on the region of interest;
performing channel separation on the image data after the down sampling;
and adjusting the brightness of the image data after channel separation based on a preset brightness adjusting parameter to obtain the preprocessed image data.
Optionally, the Raw image data includes multiple sets of Raw data stored in a bayer format, and the downsampling the region of interest includes:
extracting a group of Raw data every n rows and n columns to obtain a plurality of groups of Raw data; and n is a positive integer.
In a second aspect, a vehicle capture method is provided, the method comprising:
acquiring original image data acquired by an image acquisition assembly for a vehicle;
obtaining image data with enhanced brightness based on the original image data and a pre-trained image enhancement model; the image enhancement model is obtained by distilling knowledge of a teacher network to a student network, the image input to the student network in the knowledge distillation process is obtained by down-sampling the image input to the teacher network, and an intermediate characteristic diagram output by the teacher network in the knowledge distillation process can be shared to a network layer corresponding to the student network; the student network is trained by using the intermediate feature map and the down-sampled sample image, and the network complexity of the student network is lower than that of the teacher network;
inputting the image data with enhanced brightness into a pre-trained vehicle detection network to obtain a vehicle detection result;
and under the condition that the vehicle detection result indicates that the vehicle is detected, controlling the light-emitting assembly to emit light and controlling the image acquisition assembly to acquire the image again to obtain a snapshot image of the vehicle.
In a third aspect, an electronic device is provided, the device comprising a processor and a memory; the memory stores a program which is loaded and executed by the processor to implement the image enhancement or vehicle capture method provided by the above aspect.
In a fourth aspect, a computer-readable storage medium is provided, in which a program is stored, which when executed by a processor, is configured to implement the image enhancement or vehicle capture method provided in the above aspect.
The beneficial effect of this application includes at least: acquiring original image data acquired by an image acquisition assembly; preprocessing original image data to obtain preprocessed image data; inputting the preprocessed image data into an image enhancement model to obtain image data with enhanced brightness; since the larger the size of the image input to the student network is, the longer the time for the student network to extract the features is, on the one hand, the down-sampling processing is performed through the image input to the student network, and the calculation efficiency of the student network can be improved. On the other hand, in the conventional knowledge distillation method, the size of the image input to the student network is the same as that of the image input to the teacher network, and if the down-sampled sample image is input to the teacher network, the performance of the teacher network for image enhancement is affected. In the application, when the image enhancement model is trained, the image input to the teacher network is not subjected to downsampling processing, and only the downsampled image is input to the student network, so that the performance of image enhancement of the teacher network can be guaranteed, the calculation speed of the student network can be guaranteed, meanwhile, the network complexity of the student network is lower than that of the teacher network, the learned knowledge is distilled to the student network by the teacher network, the image enhancement performance can be guaranteed under the condition that the calculation speed of the student network is guaranteed, and therefore, the image enhancement efficiency and the image enhancement effect can be guaranteed in the image enhancement process. Meanwhile, the size of the image of the teacher network is inconsistent with the size of the image input to the student network, so that the problem that the student network is difficult to converge in the knowledge distillation process is caused. Based on the above, in the knowledge distillation process, the intermediate characteristic diagram output by the teacher network is shared to the network layer corresponding to the student network, so that the student network uses the intermediate characteristic diagram to accelerate convergence, and the training efficiency of the student network (image enhancement network) is improved.
In addition, by training the teacher network and the student network simultaneously, the teacher network does not need to be trained independently in advance independently, and the model training efficiency can be improved.
In addition, the student loss value is determined through the weighted sum of the first loss value and the second loss value, the network parameters of the student network are updated based on the student loss value and the teacher loss value, compared with the situation that the network parameters of the student network are updated respectively based on the first loss value, the teacher loss value and the second loss value, the network parameters are updated only once for each preprocessed sample image student network, the frequency of updating the network parameters by the student network can be reduced, and the accuracy of updating the network parameters by using the student loss value and the teacher loss value cannot be reduced because the student loss value is obtained by combining the first loss value and the second loss value.
In addition, by extracting a region of interest in the original image data; carrying out down-sampling processing on the region of interest; on one hand, the calculation amount of the image enhancement model can be reduced, and on the other hand, the image data of the region of interest can be ensured not to be lost too much, so that the image enhancement effect is ensured.
In addition, the brightness of the image is adjusted based on the preset brightness adjusting parameter before the image is input into the image enhancement model, so that the brightness of the image input into the image enhancement model is basically consistent, the brightness of the image output by the image enhancement model is basically consistent, and the stability of the output result of the model is ensured.
In addition, the light-emitting component flickers once when the image data with enhanced brightness is detected to exist in the vehicle, so that on one hand, the energy consumption of the light-emitting component can be reduced, the service life of the light-emitting component can be greatly prolonged, and the light pollution caused by the fact that the light-emitting component is normally bright under the dark light condition can be reduced; on the other hand, the snapshot rate of the vehicle in a dark light environment can be ensured.
The foregoing description is only an overview of the technical solutions of the present application, and in order to make the technical solutions of the present application more clear and clear, and to implement the technical solutions according to the content of the description, the following detailed description is made with reference to the preferred embodiments of the present application and the accompanying drawings.
Drawings
FIG. 1 is a flow chart of an image enhancement method provided by an embodiment of the present application;
FIG. 2 is a schematic diagram of down-sampling and channel separation of raw image data according to an embodiment of the present application;
FIG. 3 is a flow chart of a method for training an image enhancement model provided by an embodiment of the present application;
FIG. 4 is a schematic diagram of down-sampling a pre-processed sample image according to an embodiment of the present application;
FIG. 5 is a schematic illustration of a distillation process of the knowledge provided by one embodiment of the present application;
FIG. 6 is a flow chart of a vehicle capture method provided by an embodiment of the present application;
FIG. 7 is a flow chart of a vehicle capture method provided by another embodiment of the present application;
FIG. 8 is a schematic illustration of a vehicle snap-shot process provided by one embodiment of the present application;
FIG. 9 is a block diagram of an image enhancement apparatus provided by an embodiment of the present application;
FIG. 10 is a block diagram of a vehicle capture device provided in one embodiment of the present application;
fig. 11 is a block diagram of an electronic device provided by an embodiment of the application.
Detailed Description
The following detailed description of the present application will be made with reference to the accompanying drawings and examples. The following examples are intended to illustrate the present application but are not intended to limit the scope of the present application.
First, several terms referred to in the present application will be described.
Knowledge distillation (knowledge distillation): the method is a common method for model compression, is different from pruning and quantification in the model compression, and the knowledge distillation is to train a small lightweight model by utilizing supervision information of a large model with better performance so as to achieve better performance and precision. Among them, the large model is generally called teacher (teacher network), and the small model is generally called Student (Student network). The supervised information from the teacher's network output is called knowledge, and the process of the student's network learning to migrate the supervised information from the teacher's network is called Distillation.
When distillation is generally used, a student network with a smaller parameter is often found, and the lightweight student network cannot learn hidden potential relationships among data sets well compared with a teacher network. The teacher network performs logistic regression (softmax) on the output vector logits, and processes the labels more smoothly, such as: the number 1 is output as 0.6 (prediction for 1) and 0.4 (prediction for 0) and then the smoothed label, which contains more information than 1, is input into the student network. The distillation aims to enable students to learn the generalization ability of the teacher network through the network, and theoretically, the obtained result is better than that of the student network which is only matched with training data.
In the traditional knowledge distillation process, the images input to the teacher network are consistent with those input to the student network. In an image enhancement scene, a teacher network pays more attention to the image enhancement performance, and a student network pays more attention to the real-time performance of calculation. Therefore, in order to ensure the image enhancement performance of the teacher network, the size of the image input to the teacher network cannot be too small, and in this case, if the image input to the teacher network is input to the student network, the real-time performance of the student network calculation is reduced due to the large size of the image even if the complexity of the student network is low.
Based on the technical problem, in the application, the image input to the student network is obtained by down-sampling the image input to the teacher network. That is, the size of the image input to the teacher network and the size of the image input to the student network do not coincide.
However, since the size of the image of the teacher network and the size of the image input to the student network are not uniform, a problem that the student network is difficult to converge in the knowledge distillation process may occur. Based on the above, in the knowledge distillation process, the intermediate characteristic diagram output by the teacher network is shared to the corresponding network layer of the student network, so that the student network uses the intermediate characteristic diagram to accelerate convergence. For specific reference, the following examples are given.
Soft labeling: refers to the output of the softmax layer obtained by inputting data through a teacher network. Compared with a hard tag (Ground Truth), the soft tag has higher entropy and smaller gradient change.
Figure BDA0004026805830000081
The above formula is a Softmax function, and when the temperature parameter T is 1, the standard Softmax formula is adopted, and the larger the value of T, the smoother the distribution of the result. More information is obtained. When the network is trained, hard tag prediction images (hard predictions) of the student network are obtained from T = 1; soft label images (soft flags) of the teacher network and soft label predicted images (soft predictions) of the student network are derived from T = T (T > 1).
Bayer (bayer) format: the human eye is more sensitive to green by analyzing the perception of color by the human eye, so the number of green pixels of an image in the general bayer format is the sum of the number of red and blue pixels. Typically, the Bayer array consists of 1/2 green, 1/4 red and 1/4 blue.
Optionally, in the present application, the execution subject provided in each embodiment is taken as an example to be described as an electronic device, the electronic device is a terminal or a server, the terminal may be a mobile phone, a computer, a tablet computer, a scanner, an electronic eye, a monitoring camera, and the like, and the embodiment does not limit the type of the electronic device.
Fig. 1 is a flowchart of an image enhancement method provided in an embodiment of the present application, the method at least includes the following steps:
step 101, acquiring original image data acquired by an image acquisition assembly.
Optionally, the working environment of the image acquisition assembly is a low-brightness environment, or an environment with illumination intensity below a brightness threshold, in which the acquired image cannot be generally recognized by the target detection model. At this time, the original image data is a low luminance image.
The Raw image data refers to Raw data stored in a bayer format, and the bayer format is different from image formats such as jpg and the like, and the Raw data stored in the bayer format is not converted, so that the probability of image loss is low, and the Raw image data is used for image enhancement, so that the image enhancement effect can be ensured.
And 102, preprocessing the original image data to obtain preprocessed image data, wherein the preprocessed image data is matched with an input layer of a pre-trained image enhancement model.
The fitting of the preprocessed image data to the input layer of the image enhancement model comprises: the image size of the pre-processed image data is the same as the image size allowed by the input layer of the image enhancement model. Such as: the allowable image size of the input layer of the image enhancement model is 750 × 250, and the image size of the preprocessed image data should be 750 × 250.
In one example, the electronic device only down-samples the raw image data according to the image size allowed by the input layer, resulting in pre-processed image data.
In another example, the electronic device extracts a Region of Interest (ROI) in the raw image data; carrying out down-sampling processing on the region of interest; performing channel separation on the down-sampled image data; and adjusting the brightness of the image data after channel separation based on a preset brightness adjusting parameter to obtain the preprocessed image data.
Since the post-processing process after image enhancement may only be interested in the target, in this example, by extracting the region of interest first and then performing downsampling processing on the region of interest, on one hand, the amount of calculation of the image enhancement model can be reduced, and on the other hand, it can be ensured that image data of the region of interest is not lost too much, thereby ensuring the image enhancement effect.
Such as: a 4096 x 2176 resolution raw image data can typically obtain an effective ROI area of about 3000 x 1000 in a bayonet or electric alarm scenario. To further reduce the amount of computation, a downsampling process of 3000 × 1000 ROI regions is required.
The way of extracting the region of interest in the raw image data includes: an area of a preset size located at the target position is extracted. The target position may be a middle position of the original image area, or may also be another position of the original image area, the target position and the preset size may be set by a user, or may also be fixedly set in the electronic device, and the implementation manner of the target position and the preset size is not limited in this embodiment.
The Raw image data comprises a plurality of groups of Raw data stored in a Bayer format, and the region of interest is subjected to down-sampling processing, which comprises the following steps: and extracting a group of Raw data every n rows and n columns to obtain a plurality of groups of Raw data.
Since the value of n is a positive odd number due to the bayer pattern limitation, the value of n is pre-stored in the electronic device. Based on the fact that the luminance enhancement effect is deteriorated as the value of n increases, the present embodiment will be described with the value of n being 1.
Performing channel separation on the down-sampled image data, comprising: channel separation is performed on the down-sampled image data according to pixel type. Taking the extraction process and the channel separation process when n is 1 as an example, as shown in fig. 2, assuming that the image data of the region of interest is grouped into RGGB 4 data, as shown by using the data outlined by the dashed line in fig. 2, there is 3000 × 1000/4=750000 grouped data in total. As shown in fig. 2, only one set of data of odd rows and odd columns is taken, so a total of 750000/4=187500 sets of data can be obtained. The four pixel values of RGGB for each group are split into four channels according to the four pixel types of RGGB, resulting in the form of the last graph in fig. 2. At this time, data of 750 × 250 × 4 size was obtained, and RGGB data was separated for each channel.
Because the bit number of each data stored by the image acquisition assembly is unreadable by a computer, the bit number of the pixel data can be converted into a preset bit number after the channel separation is carried out on the image data after the down sampling, and the preset bit number is the bit number readable by the computer. Such as: the bit number of each data stored by the image acquisition component is 12 bits. The preset bit number readable by the computer is 8 bits or 16 bits, and in order to reduce the data amount processed by the model, the preset bit number is 8 bits in this embodiment, for example, at this time, the pixel data may be shifted to the right by 4 bits, so that the bit number of the pixel data is converted into the preset bit number.
In other embodiments, the number of bits readable by the computer may also be 16 bits, and at this time, only the pixel data needs to be complemented to 16 bits, and the method for converting the number of bits of the pixel data is not limited in this embodiment.
Adjusting the brightness of the image data after channel separation based on a preset brightness adjustment parameter to obtain the preprocessed image data, comprising: and acquiring a brightness adjusting parameter, and adjusting the brightness of the image data after channel separation to the brightness level indicated by the brightness adjusting parameter.
Wherein, the process of adjusting the brightness of the image data after channel separation can be set by the following functions: b = clip (a scale,0, 255). A represents the pixel value of the image data after channel separation, scale is a brightness adjustment parameter, 0 represents the minimum value of the pixel value after brightness adjustment, 255 represents the maximum value of the pixel value after brightness adjustment, and B represents the pixel value obtained after brightness enhancement according to scale. The brightness-adjusted pixel value needs to be limited within a value range formed by a minimum value and a maximum value. The scale is a value set by a user, illustratively, the value range of the scale is [1.0,5.0], in other embodiments, the value range of the scale may also be other numerical values, and the embodiment does not limit the implementation manner of the value range.
And 103, inputting the preprocessed image data into an image enhancement model to obtain image data with enhanced brightness.
In this embodiment, the image enhancement model is obtained by distilling knowledge of a teacher network to a student network, the image input to the student network in the knowledge distillation process is obtained by down-sampling the image input to the teacher network, and an intermediate feature map output by the teacher network in the knowledge distillation process is also shared to a network layer corresponding to the student network; the student network is trained using the intermediate feature map and the down-sampled sample image.
Specifically, fig. 3 is a flowchart of a training method of an image enhancement model provided in an embodiment of the present application, where the method includes at least the following steps:
step 31, training data is obtained, where the training data includes a sample image and a label image corresponding to the sample image.
Wherein the brightness of the label image is greater than the brightness of the sample image.
In one example, the label image and the sample image are taken of the same object using different illumination intensities; or the sample image and the label image are obtained by shooting the same target with different exposure time lengths under the same illumination, and the exposure time length of the label image is greater than that of the sample image. The target may be a vehicle, a person, or another object that needs to be detected and identified later, and the embodiment does not limit the type of the target.
And step 32, preprocessing the sample image to enable the preprocessed sample image to meet the input requirement of the teacher network.
In one example, the sample image is in bayer format, and the preprocessing of the sample image includes: extracting a region of interest in the sample image data; carrying out channel separation on the region of interest to obtain a separated sample image; and adjusting the brightness of the separated sample image based on a preset brightness adjusting parameter to obtain a preprocessed sample image.
The processes of extracting the region of interest, separating the channels, and adjusting the brightness refer to the related description in step 102, which is not repeated herein.
And step 33, inputting the preprocessed sample image into a teacher network to obtain a soft label image and an intermediate characteristic diagram.
In the embodiment, the teacher network comprises at least two first feature extraction layers and an output layer which are connected in sequence, and the student network comprises at least two second feature extraction layers and an output layer which are connected in sequence; the number of the second feature extraction layers is equal to that of the first feature extraction layers, the model complexity of the second feature extraction layers is lower than that of the first feature extraction layers, and the dimension of the feature graph output by each layer of the first feature extraction layers is the same as that of the feature graph output by the corresponding layer of the second feature extraction layers.
Correspondingly, inputting the preprocessed sample image into a teacher network to obtain a soft label image and an intermediate characteristic diagram, wherein the soft label image and the intermediate characteristic diagram comprise: inputting the preprocessed sample image into a teacher network to obtain an intermediate feature map output by an i-th layer first feature extraction layer of the teacher network and a soft label image output by an output layer of the teacher network; wherein i is a positive integer.
When a teacher network is constructed, the image enhancement performance of the network model is concerned instead of the size of the network model. Suppose that the teacher network includes three first feature extraction layers Ta, tb, tc. The deep learning network structure used in the first feature extraction layer may be multilayer convolution, deconvolution, and/or hole convolution, and the network structures of different first feature extraction layers are the same or different, and this embodiment does not limit the implementation manner of the first feature extraction layer.
The value of i is preset in the electronic device, the ith layer may be 1 layer or at least two layers, and the value of i is smaller than the number of the first feature extraction layers. Such as: in the case that the first feature extraction layer includes three layers, i may be 1, or may also be 2, or may also be 1 and 2, and the value of i is not limited in this embodiment.
When the student network is constructed, the time consumption of the network model is more concerned. Based on this, each second feature extraction layer in the student network uses convolution layers with simple structure, such as: using a common convolution of 3x3,5x5, the number of convolution layers in each second feature extraction layer during construction does not exceed a preset number of layers (e.g., 5 layers, or 4 layers, etc., and the value of the preset number of layers is not limited in this embodiment). The number of the second feature extraction layers in the student network is equal to the number of the first feature extraction layers in the teacher network. Taking the example that the teacher network includes three first feature extraction layers Ta, tb, and Tc, the student network also includes three second feature extraction layers Sa, sb, and Sc, and it is required to ensure that the output of Ta is consistent with the dimension of Sa output, the output of Tb is consistent with the dimension of Sb output, and the output of Tc is consistent with the dimension of Sc output.
The preset number of layers is less than or equal to the number of network layers in each first feature extraction layer in the teacher network, in other words, the preset number of layers is less than or equal to the number of network layers of feature extraction layers in the neural network established based on the image enhancement performance.
And step 34, sharing the intermediate characteristic graph to a network layer corresponding to the student network to obtain a soft tag predicted image and a first hard tag predicted image output by the student network.
Specifically, the sharing of the intermediate feature map to a corresponding network layer of the student network to obtain a soft tag prediction image and a first hard tag prediction image output by the student network includes: and taking the intermediate feature map as an output result of the ith layer of second feature extraction layer in the student network to input the ith +1 layer of second feature extraction layer to obtain a soft tag predicted image and a first hard tag predicted image.
In the application, the image input sizes of the teacher network and the student network are not consistent, which can lead to the problem that the student network is difficult to be directly trained. Based on this, in this embodiment, when training the student network, it is necessary to use the part of the intermediate feature map extracted from the teacher network to accelerate convergence of the student network.
Step 35, training the teacher network based on the difference between the soft label image and the soft label predicted image.
Specifically, a soft label image and a soft label prediction image are input into a first loss function to obtain a teacher loss value; and updating the network parameters of the teacher network based on the teacher loss value to train the teacher network.
The first loss function may be cross entropy or extended softmax, and this embodiment does not limit an implementation manner of the first loss function.
In other embodiments, the teacher network may also be a pre-trained teacher network, in which case step 36 is performed directly after step 34, without the need to train the teacher network.
And step 36, performing down-sampling on the preprocessed sample image to obtain a down-sampled sample image.
The larger the size of the image input to the student network is, the longer the time for the student network to perform feature extraction is. Therefore, in order to improve the computational efficiency of the student network, the preprocessed sample image needs to be down-sampled. In the conventional knowledge distillation mode, the size of images input to a student network is the same as that of images input to a teacher network, and if the down-sampled sample images are input to the teacher network, the performance of the teacher network for image enhancement is affected. Based on this, in this embodiment, it is creatively proposed to directly input the image size of the preprocessed sample image into the teacher network without changing the image size, input the downsampled sample image into the student network, and input the images with different sizes into the teacher network and the student network respectively during knowledge distillation, so that the performance of image enhancement by the teacher network can be ensured, the calculation speed of the student network can be ensured, and then the teacher network distills the learned knowledge into the student network, so that the image enhancement performance is ensured while the calculation speed is ensured by the student network.
The down-sampling of the preprocessed sample image may be to reduce the image size of the preprocessed sample image to 1/4 of the original size, or 1/16 of the original size, and the down-sampling method is not limited in this embodiment. Such as: the preprocessed sample image DataT is 1500 × 500 × 4, that is, 4-channel data with an image size of 1500 × 500; the down-sampled sample image DataS obtained by down-sampling the preprocessed sample image DataT is 750 × 250 × 4, that is, 4 channels of data having an image size of 750 × 250, and thus it is known that the data amount of the DataS is only 1/4 of the data amount of the DataT.
In one example, down-sampling the pre-processed sample image comprises: and extracting pixel data of each pixel channel in the preprocessed sample image every other preset row and preset column to obtain a down-sampled sample image. Such as: referring to fig. 4, one pixel data is extracted from the odd rows and the odd columns of each pixel channel in the preprocessed sample image DataT, respectively, to obtain the down-sampled pixel data DataS corresponding to each pixel channel.
Due to the limitation of the bayer pattern, the values of the preset rows and the preset columns are equal to be positive odd numbers. Based on that, in this embodiment, the value of the preset row and the preset column is taken as 1 for explanation.
And step 37, inputting the down-sampled sample image into a student network to obtain a second hard tag predicted image.
From steps 34 and 37, the student network performs two calculations for each preprocessed sample image.
Such as: referring to the model training process shown in fig. 5, the first feature extraction layer includes Ta, tb, and Tc, and the second feature extraction layer includes Sa, sb, and Sc, i has a value of 1 as an example. As can be seen from fig. 5, after a calculation is performed using the preprocessed sample image DataT input to the teacher network, the value of F _ Ta output by Ta is directly assigned to F _ Sa of Sa, and then a calculation is performed by Tb and Tc of the student network. And then, inputting the down-sampled sample image DataS into the student network, and calculating once again by the student network at the moment, namely calculating once by the teacher network and calculating twice by the student network.
In fig. 5, the case where F _ Ta of Ta output is shared as Sa output is explained as an example, but in another embodiment, F _ Ta of Ta output may be shared as Sa output and F _ Tb of Tb output may be shared as Sb output, and primary calculation may be performed by using F _ Ta for Tb of the student network and F _ Tb for Tc of the student network; alternatively, only F _ Tb of Tb output is shared as Sb output, and F _ Tb is used to perform one calculation through Tc of the student network, and the present embodiment does not limit the sharing method of the intermediate feature map.
Optionally, step 37 may be executed after step 34, or may also be executed before step 34, where the intermediate feature map needs to be cached for a certain period of time, and the execution sequence between step 34 and step 37 is not limited in this embodiment.
And step 38, training a student network to obtain an image enhancement model based on the difference between the first hard label predicted image and the label image, the difference between the second hard label predicted image and the label image, and the difference between the soft label image and the soft label predicted image.
In one example, according to the above steps, the student network needs to perform two calculations for the same preprocessed sample image, and in this example, the losses of the two calculations may be weighted to determine the student loss value corresponding to the student network. Specifically, training a student network based on a difference between a first hard-labeled predicted image and a labeled image, a difference between a second hard-labeled predicted image and a labeled image, and a difference between a soft-labeled image and a soft-labeled predicted image to obtain an image enhancement model, includes: inputting the soft label image and the soft label prediction image into a first loss function to obtain a teacher loss value; inputting the tag image and the first hard tag predicted image into a second loss function to obtain a first loss value; inputting the label image and the second hard label predicted image into a second loss function to obtain a second loss value; determining a weighted sum of the first loss value and the second loss value to obtain a student loss value; and updating network parameters in the student network based on the teacher loss value and the student loss values to train the student network.
The second loss function may be an L2 norm, an L1 norm, or a cross entropy, and the like, and the implementation manner of the second loss function is not limited in this embodiment.
The weights of the first loss value and the second loss value are stored in the electronic device in advance, such as: if the first loss value is weighted to 0.5 and the second loss value is weighted to 0.5, then determining a weighted sum of the first loss value and the second loss value, and obtaining a student loss value, which can be expressed by the following formula:
Loss_FnS=Loss_s1*0.5+Loss_s2*0.5;
wherein, loss _ FnS represents student Loss value, loss _ s1 represents first Loss value, and Loss _ s2 represents second Loss value.
In practical implementation, the weights of the first loss value and the second loss value may be other values, and the value of the weight is not limited in this embodiment.
Optionally, updating network parameters in the student network based on the teacher loss value and the student loss value includes: determining a weighted sum of the teacher loss value and the student loss value to obtain a total loss value; and (4) carrying out gradient return on the student network according to the total loss value so as to update the student network parameters.
At this time, the total loss value can be represented by the following formula:
Loss=(1-a)*Loss_FnT+a*Loss_FnS;
wherein Loss represents the total Loss value; loss _ FnT represents the teacher Loss value; loss _ FnS represents the student Loss value and a represents the weight taken by the Loss.
In another example, the student network is trained once based on each calculated loss value. Specifically, training a student network based on the difference between a first hard tag predicted image and a tag image, the difference between a second hard tag predicted image and the tag image, and the difference between a soft tag image and the soft tag predicted image to obtain an image enhancement model, comprises: inputting the soft label image and the soft label prediction image into a first loss function to obtain a teacher loss value; inputting the tag image and the first hard tag predicted image into a second loss function to obtain a first loss value; updating network parameters in the student network based on the teacher loss value and the first loss value to train the student network; inputting the label image and the second hard label predicted image into a second loss function to obtain a second loss value; network parameters in the student network are updated based on the teacher loss value and the second loss value to train the student network.
The network parameters in the student network are updated based on the teacher loss value and the first loss value, and the related descriptions of the network parameters in the student network are updated based on the teacher loss value and the second loss value, which are detailed in the above description of updating the network parameters in the student network based on the teacher loss value and the student loss values, except that the student loss values are replaced by the first loss value or the second loss value, which is not described in detail in this embodiment.
In summary, in the image enhancement method provided by the embodiment, the original image data acquired by the image acquisition component is acquired; preprocessing original image data to obtain preprocessed image data; inputting the preprocessed image data into an image enhancement model to obtain image data with enhanced brightness; because the larger the size of the image input to the student network is, the longer the time for the student network to extract the features is, on one hand, the down-sampling processing is performed through the image input to the student network, and the computing efficiency of the student network can be improved. On the other hand, in the conventional knowledge distillation method, the size of the image input to the student network is the same as that of the image input to the teacher network, and if the down-sampled sample image is input to the teacher network, the performance of the teacher network for image enhancement is affected. The image input to the teacher network is not subjected to downsampling processing in the application, only the downsampled image is input to the student network, the performance of image enhancement of the teacher network can be guaranteed, the computing speed of the student network can be guaranteed, meanwhile, the network complexity of the student network is lower than that of the teacher network, the learned knowledge is distilled to the student network by the teacher network, and therefore the image enhancement performance of the student network can be guaranteed under the condition that the computing speed is guaranteed, and therefore the image enhancement efficiency and the image enhancement effect can be guaranteed in the image enhancement process. Meanwhile, the problem that the student network is difficult to converge in the knowledge distillation process is caused by the fact that the sizes of the image of the teacher network and the image input to the student network are not consistent. Based on the above, in the knowledge distillation process, the intermediate characteristic diagram output by the teacher network is shared to the network layer corresponding to the student network, so that the student network uses the intermediate characteristic diagram to accelerate convergence, and the training efficiency of the student network is improved.
In addition, by simultaneously training the teacher network and the student network, the teacher network does not need to be independently trained in advance, and the model training efficiency can be improved.
In addition, the student loss value is determined through the weighted sum of the first loss value and the second loss value, the network parameters of the student network are updated based on the student loss value and the teacher loss value, compared with the method that the network parameters of the student network are updated respectively based on the first loss value, the teacher loss value, the second loss value and the teacher loss value, the network parameters are updated only once for each preprocessed sample image student network, the times of updating the network parameters by the student network can be reduced, and the accuracy of updating the network parameters by the student loss value and the teacher loss value cannot be reduced because the student loss value is obtained by combining the first loss value and the second loss value.
In addition, by extracting a region of interest in the original image data; carrying out down-sampling processing on the region of interest; on one hand, the calculation amount of the image enhancement model can be reduced, and on the other hand, the image data of the region of interest can be ensured not to be lost too much, so that the image enhancement effect is ensured.
In addition, the brightness of the image is adjusted based on the preset brightness adjusting parameter before the image is input into the image enhancement model, so that the brightness of the image input into the image enhancement model is basically consistent, the brightness of the image output by the image enhancement model is basically consistent, and the stability of the output result of the model is ensured.
Optionally, the image enhancement method may be used in a vehicle snapshot scene, or may also be used in other target detection scenes, and the following description will take the example that the image enhancement method is used in a vehicle snapshot scene as an example. In this embodiment, the scene is taken a candid photograph to the vehicle includes image acquisition subassembly and light emitting component, and image acquisition subassembly and light emitting component all link to each other with electronic equipment communication. Wherein the electronic device may be implemented in the same device as the image component and/or the light emitting component or as separate devices. Fig. 6 is a flowchart of a vehicle snapshot method according to an embodiment of the present application, the method at least includes the following steps:
step 601, acquiring original image data acquired by an image acquisition assembly for a vehicle.
The relevant description of this step is detailed in step 101, and the difference lies in the vehicle snapshot scene of the working environment of step 101, such as: scenes such as a bayonet and a parking lot are not described herein again.
Step 602, obtaining image data with enhanced brightness based on the original image data and the pre-trained image enhancement model.
In one example, the electronic device obtains brightness enhanced image data based on the manner in steps 102 and 103.
Such as: the image acquisition assembly is capable of outputting Raw data for a current environment approximately every 40ms, i.e., 25 frames per second. The electronic device acquires the latest Raw data, preprocesses the data according to the embodiment shown in fig. 1, and inputs the preprocessed image data into the image enhancement model to obtain the image data with enhanced brightness.
In another example, if the original image data is adapted to the input layer of the image enhancement model, the electronic device directly inputs the original image data into the pre-trained image enhancement model to obtain the image data with enhanced brightness.
Step 603, inputting the image data with enhanced brightness into a vehicle detection network trained in advance to obtain a vehicle detection result.
The vehicle detection network is established based on the target detection model and obtained by training the vehicle image and the vehicle label corresponding to the vehicle image, and the vehicle detection network is suitable for detecting the vehicle in the image.
The vehicle detection result is used for indicating whether the image data after the brightness enhancement comprises the vehicle or not, and when the vehicle is included, the vehicle detection result is also used for indicating the position of the vehicle in the image data after the brightness enhancement.
And step 604, under the condition that the vehicle detection result indicates that the vehicle is detected, controlling the light-emitting assembly to emit light and controlling the image acquisition assembly to acquire the image again to obtain a snapshot image of the vehicle.
Optionally, the electronic device may further perform post-processing on the captured image; post-processing refers to a process of image processing of a captured image, and includes but is not limited to: license plate recognition, image enhancement, and/or the like, and the present embodiment does not limit the post-processing manner.
Different from the conventional light and light supplement lamp scheme, in the embodiment, the light emitting component does not emit light (or is not started) under the condition that the vehicle is not detected, only emits light under the condition that the vehicle detection result indicates that the vehicle is detected, and is closed after the image acquisition component acquires the image, the whole phenomenon is that the light emitting component flickers once under the condition that the vehicle detection result indicates that the vehicle is detected, on one hand, the energy consumption of the light emitting component can be reduced, the service life of the light emitting component can be greatly prolonged, and the light pollution caused by the fact that the light emitting component is normally bright under the dark light condition is reduced; on the other hand, the snapshot rate of the vehicle in a dark light environment can be ensured.
In order to more clearly understand the vehicle capturing method proposed in the present application, the following takes the image enhancement mode shown in fig. 5 as an example to explain the vehicle capturing method, and refer to fig. 7, the method at least includes the following steps:
step 71, acquiring Raw data output by the image acquisition assembly;
step 72, preprocessing Raw data to obtain preprocessed image data;
step 73, inputting the preprocessed image data into an image enhancement model to obtain image data with enhanced brightness;
step 74, carrying out vehicle detection on the image data with enhanced brightness;
step 75, determining whether a vehicle exists based on the vehicle detection result, and if so, executing step 76; if not, go to step 72;
step 76, controlling the light-emitting component to flash once;
step 77, acquiring a snapshot image acquired under the condition that the light emitting component emits light;
step 78, post-processing the snap-shot image;
optionally, post-processing includes, but is not limited to: license plate recognition, and/or image enhancement, etc., and the embodiment does not limit the post-processing manner.
And step 79, outputting the snapshot image and the post-processing result.
Such as: in the absence of light, the original image data captured by the image acquisition is as shown in a in fig. 8, and the image is extremely dark, so that the vehicle information cannot be normally seen. After the image enhancement model processing, a map with normal vehicle body brightness as shown in B in fig. 8 can be obtained, and at this time, the image quality requirement of the vehicle detection algorithm can be met. After a vehicle is detected through a detection algorithm, the light-emitting component flickers once to obtain a snapshot image shown as C in the figure 8, the snapshot image is subjected to post-processing to obtain a diagram shown as D in the figure 8, a graphic picture can be seen from D clearly, a license plate is seen clearly, and the accuracy of post-processing can be improved.
In summary, in the vehicle snapshot method provided by this embodiment, the original image data acquired by the image acquisition assembly for the vehicle is acquired; obtaining image data with enhanced brightness based on original image data and a pre-trained image enhancement model; inputting the image data with enhanced brightness into a pre-trained vehicle detection network to obtain a vehicle detection result; under the condition that the vehicle detection result indicates that the vehicle is detected, controlling the light-emitting assembly to emit light and controlling the image acquisition assembly to acquire the image again to obtain a snapshot image of the vehicle; the problem that when the training data is used for training the image enhancement network with higher model complexity, the obtained image enhancement network is lower in calculation efficiency can be solved; the trained student network is used as an image enhancement network, so that the calculation efficiency of the image enhancement network can be improved on the premise of not influencing the image enhancement performance. Meanwhile, the light-emitting component flickers once when the image data with enhanced brightness is detected to exist in the vehicle, so that on one hand, the energy consumption of the light-emitting component can be reduced, the service life of the light-emitting component can be greatly prolonged, and the light pollution caused by the normal brightness of the light-emitting component is avoided; on the other hand, the snapshot rate of the vehicle in a dark light environment can be ensured.
Fig. 9 is a block diagram of an image enhancement apparatus according to an embodiment of the present application. The device at least comprises the following modules: a data acquisition module 910, a pre-processing module 920, and an image enhancement module 930.
A data obtaining module 910, configured to obtain original image data collected by the image collecting assembly;
a preprocessing module 920, configured to preprocess the original image data to obtain preprocessed image data, where the preprocessed image data is adapted to an input layer of a pre-trained image enhancement model;
an image enhancement module 930, configured to input the preprocessed image data into the image enhancement model, so as to obtain brightness-enhanced image data; the image enhancement model is obtained by distilling knowledge of a teacher network to a student network, the image input to the student network in the knowledge distillation process is obtained by down-sampling the image input to the teacher network, and an intermediate characteristic diagram output by the teacher network in the knowledge distillation process can be shared to a network layer corresponding to the student network; and the student network is trained by using the intermediate feature map and the down-sampled sample image, and the network complexity of the student network is lower than that of the teacher network.
For relevant details reference is made to the above-described method embodiments.
It should be noted that: in the image enhancement device based on the image enhancement model provided in the above embodiment, when performing image enhancement based on the image enhancement model, only the division of the above functional modules is taken as an example, and in practical applications, the above functions may be allocated to different functional modules as needed, that is, the internal structure of the image enhancement device based on the image enhancement model may be divided into different functional modules to complete all or part of the above described functions. In addition, the image enhancement device based on the image enhancement model provided by the above embodiment and the image enhancement method based on the image enhancement model belong to the same concept, and the specific implementation process thereof is detailed in the method embodiment and is not described herein again.
Fig. 10 is a block diagram of a vehicle capture device according to an embodiment of the present application. The device at least comprises the following modules: a data acquisition module 1010, an image enhancement module 1020, a vehicle detection module 1030, and a vehicle snapshot module 1040.
The data acquisition module 1010 is used for acquiring original image data acquired by the image acquisition assembly on the vehicle;
an image enhancement module 1020, configured to obtain image data with enhanced brightness based on the original image data and a pre-trained image enhancement model; the image enhancement model is obtained by distilling knowledge of a teacher network to a student network, the image input to the student network in the knowledge distillation process is obtained by down-sampling the image input to the teacher network, and an intermediate characteristic diagram output by the teacher network in the knowledge distillation process can be shared to a network layer corresponding to the student network; the student network is trained by using the intermediate feature map and the down-sampled sample image, and the network complexity of the student network is lower than that of the teacher network;
the vehicle detection module 1030 is configured to input the image data with the enhanced brightness into a vehicle detection network trained in advance to obtain a vehicle detection result;
and the vehicle snapshot module 1040 is configured to, when the vehicle detection result indicates that a vehicle is detected, control the light-emitting assembly to emit light and control the image acquisition assembly to acquire an image again, so as to obtain a snapshot image of the vehicle.
Reference is made to the above-described method embodiments for relevant details.
It should be noted that: in the vehicle capturing device provided in the above embodiment, when capturing a vehicle, only the division of the above functional modules is taken as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the vehicle capturing device is divided into different functional modules, so as to complete all or part of the above described functions. In addition, the vehicle capturing device provided by the embodiment and the vehicle capturing method embodiment belong to the same concept, and the specific implementation process is detailed in the method embodiment and is not described again.
Fig. 11 is a block diagram of an electronic device provided by an embodiment of the application. The device comprises at least a processor 1101 and a memory 1102.
Processor 1101 may include one or more processing cores such as: 4 core processors, 8 core processors, etc. The processor 1101 may be implemented in at least one of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 1101 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1101 may be integrated with a GPU (Graphics Processing Unit) that is responsible for rendering and drawing the content that the display screen needs to display. In some embodiments, the processor 1101 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.
Memory 1102 may include one or more computer-readable storage media, which may be non-transitory. Memory 1102 can also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1102 is used to store at least one instruction for execution by processor 1101 to implement the image enhancement or vehicle capture methods provided by method embodiments herein.
In some embodiments, the electronic device may further include: a peripheral interface and at least one peripheral. The processor 1101, memory 1102 and peripheral interface may be connected by bus or signal lines. Each peripheral may be connected to the peripheral interface via a bus, signal line, or circuit board. Illustratively, peripheral devices include, but are not limited to: radio frequency circuit, touch display screen, audio circuit, power supply, etc.
Of course, the electronic device may include fewer or more components, which is not limited by the embodiment.
Optionally, the present application further provides a computer-readable storage medium, in which a program is stored, where the program is loaded and executed by a processor to implement the image enhancement or vehicle capture method of the foregoing method embodiment.
Optionally, the present application further provides a computer product, which includes a computer-readable storage medium, in which a program is stored, and the program is loaded and executed by a processor to implement the image enhancement or vehicle capture method of the above-mentioned method embodiment.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. An image enhancement method, characterized in that the method comprises:
acquiring original image data acquired by an image acquisition assembly;
preprocessing the original image data to obtain preprocessed image data, wherein the preprocessed image data is matched with an input layer of an image enhancement model;
inputting the preprocessed image data into the image enhancement model to obtain image data with enhanced brightness, wherein the image enhancement model is obtained by distilling knowledge of a teacher network to a student network, the image input to the student network in the knowledge distillation process is obtained by down-sampling the image input to the teacher network, and an intermediate characteristic diagram output by the teacher network in the knowledge distillation process can be shared to a network layer corresponding to the student network; and the student network is trained by using the intermediate feature map and the down-sampled sample image, and the network complexity of the student network is lower than that of the teacher network.
2. The method of claim 1, wherein the training process of the image enhancement model comprises:
acquiring training data, wherein the training data comprises the sample image and a label image corresponding to the sample image, and the brightness of the label image is greater than that of the sample image;
preprocessing the sample image to enable the preprocessed sample image to meet the input requirement of a teacher network;
inputting the preprocessed sample image into the teacher network to obtain a soft label image and an intermediate characteristic diagram;
sharing the intermediate feature map to a network layer corresponding to a student network to obtain a soft tag prediction image and a first hard tag prediction image output by the student network;
training the teacher network based on differences between the soft label images and the soft label predicted images;
performing down-sampling on the preprocessed sample image to obtain a down-sampled sample image;
inputting the down-sampled sample image into the student network to obtain a second hard tag prediction image;
training the student network based on the difference between the first hard-label predicted image and the label image, the difference between the second hard-label predicted image and the label image, and the difference between the soft-label image and the soft-label predicted image to obtain the image enhancement model.
3. The method of claim 2, wherein the teacher network comprises at least two first feature extraction layers and output layers connected in series, and the student network comprises at least two second feature extraction layers and output layers connected in series; the number of the second feature extraction layers is equal to that of the first feature extraction layers, the model complexity of the second feature extraction layers is lower than that of the first feature extraction layers, and the dimension of the feature graph output by each layer of the first feature extraction layers is the same as that of the feature graph output by the corresponding layer of the second feature extraction layers; inputting the preprocessed sample image into a teacher network to obtain a soft label image and an intermediate characteristic diagram, wherein the soft label image and the intermediate characteristic diagram comprise:
inputting the preprocessed sample image into a teacher network to obtain an intermediate feature map output by an ith layer first feature extraction layer of the teacher network and a soft label image output by an output layer of the teacher network; wherein i is a positive integer;
correspondingly, the sharing the intermediate feature map to a corresponding network layer of a student network to obtain a soft-tag predicted image and a first hard-tag predicted image output by the student network includes:
and taking the intermediate feature map as an output result of an ith layer of second feature extraction layer in the student network to input into an (i + 1) th layer of second feature extraction layer to obtain the soft tag predicted image and the first hard tag predicted image.
4. The method of claim 2, wherein the training the student network based on differences between the first hard-tagged predictive image and the tagged image, the second hard-tagged predictive image and the tagged image, and the soft-tagged image and the soft-tagged predictive image results in the image enhancement model comprising:
inputting the soft label image and the soft label prediction image into a first loss function to obtain a teacher loss value;
inputting the tag image and the first hard tag predicted image into a second loss function to obtain a first loss value;
inputting the tag image and the second hard tag predicted image into the second loss function to obtain a second loss value;
determining a weighted sum of the first loss value and the second loss value to obtain a student loss value;
updating network parameters in the student network based on the teacher loss value and the student loss values to train the student network.
5. The method of claim 2, wherein the sample image is in bayer format, and wherein pre-processing the sample image comprises:
extracting a region of interest in the sample image data;
carrying out channel separation on the region of interest to obtain a separated sample image;
and adjusting the brightness of the separated sample image based on a preset brightness adjusting parameter to obtain the preprocessed sample image.
6. The method of claim 1, wherein the pre-processing the raw image data comprises:
extracting a region of interest in the raw image data;
performing down-sampling processing on the region of interest;
performing channel separation on the down-sampled image data;
and adjusting the brightness of the image data after channel separation based on a preset brightness adjusting parameter to obtain the preprocessed image data.
7. The method of claim 6, wherein the Raw image data comprises a plurality of sets of Raw data stored in a bayer format, and wherein the downsampling the region of interest comprises:
extracting a group of Raw data every n rows and n columns to obtain a plurality of groups of Raw data; and n is a positive integer.
8. A vehicle snap-shot method, characterized in that the method comprises:
acquiring original image data acquired by an image acquisition assembly for a vehicle;
obtaining image data with enhanced brightness based on the original image data and a pre-trained image enhancement model; the image enhancement model is obtained by distilling knowledge of a teacher network to a student network, the image input to the student network in the knowledge distillation process is obtained by down-sampling the image input to the teacher network, and an intermediate characteristic diagram output by the teacher network in the knowledge distillation process can be shared to a network layer corresponding to the student network; the student network is trained by using the intermediate feature map and the down-sampled sample image, and the network complexity of the student network is lower than that of the teacher network;
inputting the image data with enhanced brightness into a pre-trained vehicle detection network to obtain a vehicle detection result;
and under the condition that the vehicle detection result indicates that the vehicle is detected, controlling the light-emitting assembly to emit light and controlling the image acquisition assembly to acquire the image again to obtain a snapshot image of the vehicle.
9. An electronic device, wherein the device comprises a processor and a memory; the memory has stored therein a program that is loaded and executed by the processor to implement the image enhancement method of any one of claims 1 to 7 or to implement the vehicle capture method of claim 8.
10. A computer-readable storage medium, characterized in that the storage medium has stored therein a program which, when being executed by a processor, is adapted to carry out the image enhancement method according to any one of claims 1 to 7, or the vehicle snap-shot method according to claim 8.
CN202211709136.0A 2022-12-29 2022-12-29 Image enhancement method, vehicle snapshot method, device and medium Pending CN115984133A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211709136.0A CN115984133A (en) 2022-12-29 2022-12-29 Image enhancement method, vehicle snapshot method, device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211709136.0A CN115984133A (en) 2022-12-29 2022-12-29 Image enhancement method, vehicle snapshot method, device and medium

Publications (1)

Publication Number Publication Date
CN115984133A true CN115984133A (en) 2023-04-18

Family

ID=85973705

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211709136.0A Pending CN115984133A (en) 2022-12-29 2022-12-29 Image enhancement method, vehicle snapshot method, device and medium

Country Status (1)

Country Link
CN (1) CN115984133A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117268345A (en) * 2023-11-20 2023-12-22 启元实验室 High-real-time monocular depth estimation measurement method and device and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117268345A (en) * 2023-11-20 2023-12-22 启元实验室 High-real-time monocular depth estimation measurement method and device and electronic equipment
CN117268345B (en) * 2023-11-20 2024-03-29 启元实验室 High-real-time monocular depth estimation measurement method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN110176027B (en) Video target tracking method, device, equipment and storage medium
CN111292264A (en) Image high dynamic range reconstruction method based on deep learning
CN112686207B (en) Urban street scene target detection method based on regional information enhancement
CN111582095B (en) Light-weight rapid detection method for abnormal behaviors of pedestrians
CN109657715B (en) Semantic segmentation method, device, equipment and medium
CN114972976B (en) Night target detection and training method and device based on frequency domain self-attention mechanism
CN112801027B (en) Vehicle target detection method based on event camera
CN113052006B (en) Image target detection method, system and readable storage medium based on convolutional neural network
CN110555465A (en) Weather image identification method based on CNN and multi-feature fusion
CN110929593A (en) Real-time significance pedestrian detection method based on detail distinguishing and distinguishing
CN115512251A (en) Unmanned aerial vehicle low-illumination target tracking method based on double-branch progressive feature enhancement
CN113128308B (en) Pedestrian detection method, device, equipment and medium in port scene
CN112819858B (en) Target tracking method, device, equipment and storage medium based on video enhancement
US20230131589A1 (en) Method and electronic device for segmenting objects in scene
CN113409355A (en) Moving target identification system and method based on FPGA
CN114708172A (en) Image fusion method, computer program product, storage medium, and electronic device
CN112861987A (en) Target detection method under dark light environment
CN115984133A (en) Image enhancement method, vehicle snapshot method, device and medium
CN110533027B (en) Text detection and identification method and system based on mobile equipment
CN115019340A (en) Night pedestrian detection algorithm based on deep learning
CN113221823B (en) Traffic signal lamp countdown identification method based on improved lightweight YOLOv3
CN111242870A (en) Low-light image enhancement method based on deep learning knowledge distillation technology
CN117274107B (en) End-to-end color and detail enhancement method, device and equipment under low-illumination scene
CN113989785A (en) Driving scene classification method, device, equipment and storage medium
CN112686314A (en) Target detection method and device based on long-distance shooting scene and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination