CN112084908A

CN112084908A - Image processing method and system and storage medium

Info

Publication number: CN112084908A
Application number: CN202010884492.0A
Authority: CN
Inventors: 周鹏; 范明; 张三林; 杭宸; 郭继舜; 张志德
Original assignee: Guangzhou Automobile Group Co Ltd
Current assignee: Guangzhou Automobile Group Co Ltd
Priority date: 2020-08-28
Filing date: 2020-08-28
Publication date: 2020-12-15

Abstract

The invention relates to an image processing method, a system and a storage medium thereof, wherein the method comprises the following steps: acquiring a low-resolution image sequence of the front environment of the vehicle at the current moment; acquiring a first high-resolution image sequence output by a neural network model at the previous moment, and sampling and linearly processing the first high-resolution image sequence to obtain a second high-resolution image sequence; and inputting the low-resolution image sequence and the second high-resolution image sequence into the neural network model for image fusion to output a first high-resolution image sequence at the current moment. By implementing the invention, super-resolution enhancement can be realized on the image acquired by the vehicle camera device, so that the detection of the long-distance small obstacle can be better carried out according to the image in the high-speed driving process.

Description

Image processing method and system and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image processing method, an image processing system, and a storage medium.

Background

At present, an automatic driving technology is a hot point of industrial research, automatic driving related algorithms and engineering application feasibility and safety are intensively researched at home and abroad, in an application scene of automatic driving, a small remote obstacle inevitably appears, the safety of driving decision in the automatic driving process is influenced by the detection accuracy and real-time performance of the small obstacle under high-speed driving, and an important factor for improving the detection accuracy and meeting the real-time performance requirement is image acquisition and processing, and resolution enhancement needs to be carried out on an original image so as to better detect the small remote obstacle.

Disclosure of Invention

The invention aims to provide an image processing method, an image processing system and a storage medium, which are used for realizing super-resolution enhancement of an image acquired by a vehicle camera device so as to better detect a long-distance small obstacle according to the image in the process of high-speed driving.

To achieve the above object, according to a first aspect, an embodiment of the present invention provides an image processing method, including:

acquiring a low-resolution image sequence of the front environment of the vehicle at the current moment; acquiring a first high-resolution image sequence output by a neural network model at the previous moment, and sampling and linearly processing the first high-resolution image sequence to obtain a second high-resolution image sequence;

and inputting the low-resolution image sequence and the second high-resolution image sequence into the neural network model for image fusion to output a first high-resolution image sequence at the current moment.

Preferably, the sampling and linear processing of the first high-resolution image sequence to obtain a second high-resolution image sequence includes:

sampling the first high-resolution image sequence under the condition of maximum pooling to obtain an image sequence with the same size as the low-resolution image sequence, and performing linear processing on the image sequence obtained by sampling to obtain a second high-resolution image sequence; wherein, the linear processing process is shown as the following formula:

wherein alpha is a hyper-parameter and is obtained by pre-training the neural network model;

is one image in the second high resolution image sequence;

is a sampled one of the sequence of images.

Preferably, the sequence of low resolution images, the sequence of first high resolution images, and the sequence of second high resolution images each comprise 3 consecutive frames of images.

Preferably, the hyper-parameter α is obtained by pre-training in the following way:

acquiring a low-resolution image sequence sample and a high-resolution image sequence sample for training, and training a neural network model according to the low-resolution image sequence sample and the high-resolution image sequence sample; wherein the low resolution image sequence samples correspond to acquisition instants of the high resolution image sequence samples;

wherein, the loss function of the neural network model in the training process is as follows:

L＝μ*L1(k,k)+γ*L1(k,k-1)+ρ*L1(k,k+1)

wherein L1(k, k) represents the difference between the output image of the neural network model at time k and the corresponding image in the high resolution image sequence sample at time k; l1(k, k-1) is the difference between the output image of the neural network model at the time k and the corresponding image in the high-resolution image sequence sample at the time k-1; l1(k, k +1) is the difference between the output image of the neural network model at the time k and the corresponding image in the high-resolution image sequence sample at the time k + 1; mu, gamma and rho are weight parameters;

performing iterative calculation to determine the loss value corresponding to the convergence of the loss function

And based on said loss value

Determining the hyper-parameter alpha.

Preferably, said determining is based on said loss value

Determining the hyper-parameter a, comprising:

wherein L is_HIs the initial loss value of the loss function, L_LIs the loss minimum of the loss function set in advance.

Preferably, the distance between the output image sequence of the neural network model and the corresponding image in the high resolution image sequence sample is calculated according to the following formula:

wherein L1 is the image difference,

is an output image of the neural network model; i is_hr(i, j) are corresponding images in the high-resolution image sequence samples; m is the dimension of the image; n is the image frame number of the high-resolution image sequence sample; and (i, j) is the pixel value of a pixel point in the image.

According to a second aspect, an embodiment of the present invention provides an image processing system, including:

the system comprises an image acquisition unit, a processing unit and a processing unit, wherein the image acquisition unit is used for acquiring a low-resolution image sequence of the environment in front of the vehicle at the current moment; acquiring a first high-resolution image sequence output by a neural network model at the previous moment, and sampling and linearly processing the first high-resolution image sequence to obtain a second high-resolution image sequence;

and the image fusion unit is used for inputting the low-resolution image sequence and the second high-resolution image sequence into the neural network model for image fusion and outputting the first high-resolution image sequence at the current moment.

Preferably, the image acquiring unit specifically includes:

the system comprises a first image acquisition unit, a second image acquisition unit and a third image acquisition unit, wherein the first image acquisition unit is used for acquiring a low-resolution image sequence of the environment in front of the vehicle at the current moment;

the second image acquisition unit is used for acquiring a first high-resolution image sequence output by a neural network model at the previous moment, sampling the first high-resolution image sequence under the condition of maximum pooling to obtain an image sequence with the same size as the low-resolution image sequence, and performing linear processing on the image sequence obtained by sampling to obtain a second high-resolution image sequence; wherein, the linear processing process is shown as the following formula:

is one image in the second high resolution image sequence;

is a sampled one of the sequence of images.

According to a third aspect, an embodiment of the present invention proposes a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the image processing method according to the first aspect.

The embodiment of the invention provides an image processing method, a system and a storage medium thereof, when a vehicle is in a high-speed driving state, a low-resolution image sequence shot by a long-focus camera of the vehicle is input into a neural network model, a first high-resolution image sequence output by the neural network model at the last moment is sampled and linearly processed to obtain a second high-resolution image sequence, the neural network model carries out image fusion processing on the low-resolution image sequence and the second high-resolution image sequence, and the first high-resolution image sequence at the current moment is output. Therefore, the embodiment of the invention can obtain the enhanced first high-resolution image sequence at the current moment only by inputting the low-resolution image sequence at the current moment, and the first high-resolution image sequence at the current moment can be used for subsequent long-distance small object identification, thereby being beneficial to reducing the difficulty of small obstacle identification and improving the detection accuracy and real-time performance of long-distance small obstacles in high-speed automatic driving.

Additional features and advantages of the invention will be set forth in the detailed description which follows.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of an image processing method according to an embodiment of the present invention.

FIG. 2 is a schematic diagram of neural network model training in an embodiment of the present invention.

FIG. 3 is a flowchart illustrating training of a neural network model according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of a shooting process of the long-focus camera and the short-focus camera in the vehicle driving process in the embodiment of the invention.

Fig. 5 is a schematic diagram of a sequence of images captured during operation of a vehicle according to an embodiment of the present invention.

Fig. 6 is a flowchart of testing a trained neural network model according to an embodiment of the present invention.

FIG. 7 is a block diagram of an image processing system according to another embodiment of the present invention.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In addition, numerous specific details are set forth in the following specific examples in order to better illustrate the invention. It will be understood by those skilled in the art that the present invention may be practiced without some of these specific details. In some instances, well known means have not been described in detail so as not to obscure the present invention.

Referring to fig. 1, an embodiment of the invention provides an image processing method, including the following steps S1 to S2:

step S1, acquiring a low-resolution image sequence of the environment in front of the vehicle at the current moment; and acquiring a first high-resolution image sequence output by the neural network model at the previous moment, and sampling and linearly processing the first high-resolution image sequence to obtain a first high-resolution image sequence.

Specifically, the low-resolution image sequence, the first high-resolution image sequence and the first high-resolution image sequence comprise a plurality of continuous frames of images, the sampling of the first high-resolution image sequence aims at obtaining an image sequence with the same size as the low-resolution image sequence, and the linear processing aims at performing transformation processing on the images so as to facilitate subsequent image fusion to obtain a desired high-resolution image.

And step S2, inputting the low-resolution image sequence and the second high-resolution image sequence into the neural network model for image fusion, and outputting a first high-resolution image sequence at the current moment.

Specifically, image fusion in the step refers to a means of extracting favorable information in respective channels to the maximum extent from image data about the same target collected by a multi-source channel through image processing, computer technology and the like, and finally synthesizing high-quality images so as to improve the utilization rate of image information, improve the accuracy and reliability of computer interpretation and improve the resolution of original images. The embodiment is intended to fuse the high-resolution image output of the neural network model at the previous time with the low-resolution image sequence input of the current time, and the specific mode of image fusion may be arbitrarily selected, which is not specifically limited in the embodiment. The neural network model can be obtained by training in advance. Because the neural network model outputs the enhancement aiming at the original image, the front sub-image and the rear sub-image are continuous in time in the training data input sequence, and the input of the previous moment and the input of the next moment are respectively the network enhanced image and the actually shot enhanced image, the input and the output of the network are necessarily in internal association. In order to fully utilize the relevance, a mode of feeding back the last moment output of the neural network model to the current moment input is adopted.

In summary, when the vehicle is in a high-speed driving state, the low-resolution image sequence captured by the telephoto camera of the vehicle is input into the neural network model, and the first high-resolution image sequence output by the neural network model at the previous time is sampled and linearly processed to obtain a second high-resolution image sequence, and the neural network model performs image fusion processing on the low-resolution image sequence and the second high-resolution image sequence to output the first high-resolution image sequence at the current time. Therefore, the embodiment of the invention can obtain the enhanced first high-resolution image sequence at the current moment only by inputting the low-resolution image sequence at the current moment, and the first high-resolution image sequence at the current moment can be used for subsequent long-distance small object identification, thereby being beneficial to reducing the difficulty of small obstacle identification and improving the detection accuracy and real-time performance of long-distance small obstacles in high-speed automatic driving.

Preferably, the sampling and linear processing on the first high-resolution image sequence in step S2 to obtain a second high-resolution image sequence includes:

sampling the first high-resolution image sequence under the condition of maximum pooling to obtain an image sequence with the same size as the low-resolution image sequence, and performing linear processing on the image sequence obtained by sampling to obtain a second high-resolution image sequence, namely, feedback information obtained according to the last moment output of a neural network model; wherein, the linear processing process is shown as the following formula:

is one image in the second high resolution image sequence;

is a sampled one of the sequence of images.

In particular, a second high resolution image sequence, which is a feedback information part of the input neural network model, may be obtained according to the linear processing described above.

Preferably, the low resolution image sequence, the first high resolution image sequence and the second high resolution image sequence each comprise 3 consecutive frames of images, that is, the input of the neural network model is set to 6 channels for inputting 6 frames of images.

Preferably, with reference to fig. 2-3, said hyper-parameter α is pre-trained in particular by:

a1, acquiring a low-resolution image sequence sample and a high-resolution image sequence sample for training, and training a neural network model according to the low-resolution image sequence sample and the high-resolution image sequence sample; wherein the low resolution image sequence samples correspond to acquisition instants of the high resolution image sequence samples;

specifically, a long-focus camera and a short-focus camera at the front end of the vehicle can be used for respectively obtaining low-resolution image sequence samples and high-resolution image sequence samples, wherein the image sequence comprises a plurality of continuous-time frames of images; labeling the image information of the high-resolution image sequence sample;

the long-focus camera and the short-focus camera in the driving process of the vehicle can be taken as reference in fig. 4, and fig. 5 is an image sequence taken in the running process of the vehicle, wherein the long-focus camera takes n frames of images in the running process of the vehicle, and data is used

Expressing that the dimension is n × m, the short-focus camera also obtains and shoots n frames of images, and the data is used

Representing that the dimension is n x m, each frame of image obtained by the long-focus camera and the short-focus camera has m data marked, and each data is used

Or

The following formula (2).

In this embodiment, the high resolution sequence labeled in the training data set is compared with the network output to establish a loss function, so as to optimize the loss function. The loss function includes: and comparing the image output by the network with the high-resolution sequence image sample obtained in the first step to obtain a loss function. Specifically, the loss function of the neural network model during the training process is as follows:

L＝μ*L1(k,k)+γ*L1(k,k-1)+ρ*L1(k,k+1) (3)

specifically, the correspondence relationship between the images of the plurality of frames in the two sequences is determined according to the sequence order.

Step A2, iterative calculation is carried out to determine the loss value corresponding to the convergence of the loss function

And based on said loss value

Determining the hyper-parameter alpha.

Specifically, before the first frame of low-resolution image of the training data set is input into the network for forward propagation, corresponding feedback information does not exist, and information of a corresponding form of feedback input channel needs to be virtualized. The method specifically comprises the following steps: using up-sampling r times the input Ilr first, e.g. but not limited to interpolationUp-sampling and then re-using r-times down-sampling, e.g. when not limited to maximum pooling down-sampling, to obtain three channels of n x n dimensions

As feedback information in the network input layer. Setting input low-resolution image information of the neural network as I_lrCorresponding high resolution label image information is I_hrAfter the multi-layer processing of the neural network, high-resolution image information is obtained at an output layer

By introducing the feedback information, the algorithm convergence speed can be accelerated, the training time is reduced, the information quantity of the input layer is enriched, the required neural network hierarchical structure is theoretically simplified, and the real-time performance of the method in engineering application is improved.

Preferably, said determining is based on said loss value

Determining the hyper-parameter a, comprising:

wherein L is_HIs the initial loss value of the loss function, L_LThe minimum value of the loss of the predetermined loss function is preferably 0.5.

Specifically, α depends on the degree of the phase difference between the output image and the real image, the more the phase difference is, the smaller the occupied weight is, the value range is set to [0,0.5], and the change rule is related to the loss of the network.

wherein L1 is the image difference,

When the loss function converges, the training is finished, and after the training is finished, the trained neural network model may be tested through the process shown in fig. 6, where the test sample data may be obtained again or may be obtained from the training sample data.

Referring to fig. 7, another embodiment of the invention provides an image processing system, including:

an image acquisition unit 1 for acquiring a low-resolution image sequence of an environment in front of a vehicle at a current time; acquiring a first high-resolution image sequence output by a neural network model at the previous moment, and sampling and linearly processing the first high-resolution image sequence to obtain a second high-resolution image sequence;

and the image fusion unit 2 is used for inputting the low-resolution image sequence and the second high-resolution image sequence into the neural network model for image fusion and outputting a first high-resolution image sequence at the current moment.

Preferably, the image acquiring unit 1 specifically includes:

a first image acquisition unit 11 for acquiring a low-resolution image sequence of an environment ahead of the vehicle at a current time;

the second image obtaining unit 12 is configured to obtain a first high-resolution image sequence output by the neural network model at the previous time, perform sampling on the first high-resolution image sequence under maximum pooling to obtain an image sequence with the same size as the low-resolution image sequence, and perform linear processing on the image sequence obtained by sampling to obtain a second high-resolution image sequence; wherein, the linear processing process is shown as the following formula:

is one image in the second high resolution image sequence;

is a sampled one of the sequence of images.

The above-described system embodiments are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

It should be noted that the image processing system according to the foregoing embodiment corresponds to the image processing method according to the foregoing embodiment, and therefore, portions of the image processing system according to the foregoing embodiment that are not described in detail can be obtained by referring to the content of the image processing method according to the foregoing embodiment, and are not described herein again.

Also, the image processing system according to the above-described embodiment, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer-readable storage medium.

Another embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the image processing method according to the above-mentioned embodiment.

Specifically, the computer-readable storage medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. An image processing method, comprising:

2. The image processing method of claim 1, wherein said sampling and linearly processing the first sequence of high resolution images into a second sequence of high resolution images comprises:

is one image in the second high resolution image sequence;

is a sampled one of the sequence of images.

3. The image processing method according to claim 1, wherein the sequence of low resolution images, the sequence of first high resolution images, and the sequence of second high resolution images each comprise 3 consecutive frames of images.

4. The image processing method according to claim 2, wherein the hyper-parameter α is pre-trained by:

L＝μ*L1(k,k)+γ*L1(k,k-1)+ρ*L1(k,k+1)

And based on said loss value

Determining the hyper-parameter alpha.

5. The image processing method according to claim 4, wherein said determining is based on said loss value

Determining the hyper-parameter a, comprising:

6. The image processing method according to claim 3, wherein the distance between the output image sequence of the neural network model and the corresponding image in the high resolution image sequence sample is calculated according to the following formula:

wherein L1 is the image difference,

7. An image processing system, comprising:

8. The image processing system according to claim 7, wherein the image acquisition unit specifically includes:

is one image in the second high resolution image sequence;

is a sampled one of the sequence of images.

9. The image processing system of claim 7, wherein the sequence of low resolution images, the sequence of first high resolution images, and the sequence of second high resolution images each comprise 3 consecutive frames of images.

10. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program realizing the steps of the image processing method according to any one of claims 1-6 when executed by a processor.