CN113076685A - Training method of image reconstruction model, image reconstruction method and device thereof - Google Patents

Training method of image reconstruction model, image reconstruction method and device thereof Download PDF

Info

Publication number
CN113076685A
CN113076685A CN202110237930.9A CN202110237930A CN113076685A CN 113076685 A CN113076685 A CN 113076685A CN 202110237930 A CN202110237930 A CN 202110237930A CN 113076685 A CN113076685 A CN 113076685A
Authority
CN
China
Prior art keywords
image
frames
definition
optical flow
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110237930.9A
Other languages
Chinese (zh)
Other versions
CN113076685B (en
Inventor
陈帅军
徐芳
贾旭
乔振东
杜泽伟
刘健庄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202110237930.9A priority Critical patent/CN113076685B/en
Publication of CN113076685A publication Critical patent/CN113076685A/en
Application granted granted Critical
Publication of CN113076685B publication Critical patent/CN113076685B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

The training method mainly enables the image reconstruction model to carry out re-blurring processing on a reconstructed high-definition image, then compares the image obtained after the re-blurring processing with an originally input blurred image, equivalently adds the constraint condition, so that a high-definition image frame sequence obtained by utilizing the trained image reconstruction model to be closer to real data of the originally input blurred image, and improves reconstruction quality and reconstruction effect. Corresponding to more features being mined from the event data and the blurred image, thereby improving the quality of the reconstruction.

Description

Training method of image reconstruction model, image reconstruction method and device thereof
Technical Field
The application relates to the field of computer vision in artificial intelligence, in particular to a training method of an image reconstruction model, an image reconstruction method and a device thereof.
Background
Motion blur typically occurs in scenes with significant motion during the exposure time, especially in low light environments with lightweight mobile devices, such as cell phones and onboard cameras. While motion blur causes undesirable image degradation such that the visual content becomes unintelligible, motion blurred images also encode rich information about the relative motion between the camera and the observed scene. Therefore, the recovery (reconstruction) of a clear frame sequence (photo-sequencing) from a single motion-blurred image is helpful for understanding the dynamics of a scene, and has wide application in image reconstruction, autopilot, and video surveillance. The motion blurred image can be seen as the average of the high definition frames over the exposure time. Since averaging destroys the temporal order of the frames, it is very inappropriate to recover a set of clear frame sequences from a single motion-blurred image, i.e. the sequence to be recovered is not unique, and there may be sequences of different high-definition frames making up the same motion-blurred image.
To address the non-uniqueness of the sequence to be recovered, an event camera is introduced that can provide temporal sequence interframe variation to guide the recovery of the sequence. Event cameras are bio-inspired, event-driven, time-based neuromorphic visual sensors that sense the world on a very different principle than traditional cameras, measure brightness changes by asynchronous work, and trigger events once the change exceeds a threshold. The event camera abandons the concepts of exposure time and frames in the traditional intensity camera, can capture almost continuous motion in a frameless mode (microsecond time resolution), and therefore does not encounter the problem of image blurring. The use of an event camera will greatly assist in recovering sharp frames from blurred images.
In the prior art, a method for reconstructing a video sequence based on a single motion blur image introduces event camera data, and models a motion blur generation process (i.e., obtains an image reconstruction model) by associating the event data with a sharp image. However, due to the fact that event data has large noise and the threshold value triggered by the event is not constant, noise accumulation is easy to occur in the image recovery process, the performance of the image reconstruction model is difficult to improve, the quality of the reconstructed high-definition image is low, and the calculation process of the existing image reconstruction model is high in calculation complexity and low in calculation efficiency.
Therefore, how to improve the reconstruction quality of the blurred image is an urgent technical problem to be solved.
Disclosure of Invention
The application provides a training method of an image reconstruction model, an image reconstruction method and a device thereof, which can effectively improve the reconstruction quality of blurred images.
In a first aspect, a training method for an image reconstruction model is provided, the training method including: acquiring a fuzzy image to be trained and event data to be trained, wherein the fuzzy image to be trained is an average value of a plurality of frames of first high-definition images, and the event data to be trained comprises event points in a time period corresponding to the fuzzy image to be trained; the method comprises the steps of training an image reconstruction model by using a to-be-trained blurred image and event data to be trained to obtain a target image reconstruction model, updating parameters of the image reconstruction model according to constraint conditions in a training process to obtain the target image reconstruction model, wherein the constraint conditions comprise that the difference value between the to-be-trained blurred image and a re-blurred image is minimum, the re-blurred image is obtained by processing a plurality of frames of second high-definition images, and the multi-frame second high-definition images are obtained by processing the to-be-trained blurred image and the event data to be trained by using the image reconstruction model.
In the technical scheme of the application, the image reconstruction model is mainly enabled to carry out the re-blurring processing on the reconstructed high-definition image (second high-definition image), and then the image (re-blurring image) obtained after the blurring processing is compared with the originally input blurred image (blurred image to be trained), which is equivalent to increasing the constraint condition, so that the sequence of the high-definition image frames reconstructed by the trained image reconstruction model is closer to the real data of the originally input blurred image, and the reconstruction quality and the reconstruction effect are improved. Corresponding to more features being mined from the event data and the blurred image, thereby improving the quality of the reconstruction. This constraint may be referred to as a self-consistency constraint.
The method for performing the re-blurring processing on the plurality of frames of the second high-definition images may be various, for example, the plurality of frames of the second high-definition images may be directly processed, or for example, the optical flow estimation result and the second high-definition images may be combined for processing.
The direct processing of the plurality of frames of the second high-definition images may be averaging the plurality of frames of the second high-definition images. With reference to the first aspect, in some implementations of the first aspect, the deblurred image is obtained by averaging multiple frames of the second high-definition image.
Combining the optical flow estimation result and the second high-definition image is equivalent to processing the optical flow estimation result and the second high-definition image. With reference to the first aspect, in certain implementations of the first aspect, the re-blurred image is obtained by processing an optical flow estimation result and multiple frames of second high-definition images, the optical flow estimation result is obtained by processing a to-be-trained blurred image and event data to be trained by using an optical flow estimation network, and the optical flow estimation result includes motion information of the to-be-trained blurred image.
With reference to the first aspect, in some implementations of the first aspect, the re-blurred image is obtained by inserting an interpolation frame between at least two adjacent frames of the multi-frame second high-definition image by using an optical flow estimation result, and then averaging the interpolation frame and the multi-frame second high-definition image.
With reference to the first aspect, in certain implementations of the first aspect, the constraint condition further includes minimizing a difference between an i-th frame of optical flow estimated image in the multiple frames of optical flow estimated images and an i-th frame of second high-definition images in the multiple frames of second high-definition images, where the multiple frames of optical flow estimated images are obtained by performing a warping transformation on the multiple frames of second high-definition images using the optical flow estimated results, where the i-th frame of optical flow estimated image in the multiple frames of optical flow estimated images is obtained by performing a warping transformation on an i + 1-th frame of second high-definition images in the multiple frames of second high-definition images, and i is an integer greater than or equal to 0 and less than the number of frames of the multiple frames of first high-definition images. This approach is equivalent to adding a further constraint, which may be referred to as a luminance consistency constraint, i.e. the luminance between adjacent frames should be consistent. Therefore, the reconstructed image can be regarded as a blurred image with the information of the original blurred image kept as much as possible in the time dimension, and therefore, the quality and the effect of image reconstruction can be further improved. Specifically, the motion is continuous and sequential, and in the prior art, the fact that the multi-frame images are sequentially kept consistent in time sequence is not considered, so that the definition of a reconstructed high-definition image is still low and the quality of the reconstructed high-definition image is poor.
With reference to the first aspect, in certain implementations of the first aspect, a difference value between an i-th frame of optical flow estimated image in the plurality of frames of optical flow estimated images and an i-th frame of second high-definition image in the plurality of frames of second high-definition images may satisfy the following formula:
Figure BDA0002960992300000021
wherein L iswarpFor the difference, τ represents the number of frames of the first high definition image of the plurality of frames, Ii(x) Representing the second high definition image of the ith frame,
Figure BDA0002960992300000022
representing the i-th frame optical flow estimation image.
With reference to the first aspect, in certain implementations of the first aspect, the constraint condition further includes minimizing a distance between adjacent pixels in each of the plurality of frames of optical-flow estimated images. This method is equivalent to adding a constraint, which may be referred to as a smoothing constraint, to take into account the distance change between adjacent pixels in a single frame image, that is, pixels in a single frame image do not suddenly change, thereby improving the quality of image reconstruction.
With reference to the first aspect, in certain implementations of the first aspect, the constraint condition further includes minimizing a difference between the plurality of frames of the second high definition images and the plurality of frames of the first high definition images. This approach is equivalent to adding a constraint, which may be referred to as a reconstruction loss constraint, where the reconstruction loss may be understood as the difference between the output of the reconstruction network and the true value corresponding to the blurred image. The real value may be the first high definition image described above.
In a second aspect, an image reconstruction method is provided, which includes acquiring a first blurred image and first event data, where the first event data includes event points in a time period corresponding to the first blurred image; inputting the first blurred image and the first event data into a target image reconstruction model to obtain a multi-frame target high-definition image, wherein the target image reconstruction model is obtained by utilizing a training method of any one implementation mode in the first aspect; and outputting one or more frames in the multi-frame target high-definition image.
Since the second aspect mainly reconstructs an image by using the target reconstruction model obtained by the first aspect and any implementation manner thereof, the image reconstruction method can achieve the technical effect of improving the reconstruction quality, and the technical effect of the first aspect is completely applicable to the second aspect and will not be described repeatedly.
After the target image reconstruction model is fixed, the frame number of the high-definition image reconstructed from a single blurred image is fixed, so that the first event data can be segmented according to the frame number. For example, it is assumed that the number of frames of the multi-frame target high-definition image obtained by the target image reconstruction model is 5, that is, the target image reconstruction model can reconstruct a blurred image into a 5-frame high-definition image. The first event data may be divided into 5 segments, and the length of the corresponding time period of each segment may be the same. Of course, the time period may be divided into an integral multiple of 5, such as 10, and then the event points of each two adjacent time periods are taken to form one time period, and the first event data may still be divided into 5 time periods. The above numerical values are for example only and for ease of understanding, and no limitation is present.
In a third aspect, a training apparatus for an image reconstruction model is provided, where the training apparatus includes a unit capable of implementing the training method of the first aspect and any one implementation manner thereof. For example, the training device may include an obtaining unit configured to obtain a blurred image to be trained and event data to be trained, and a training unit configured to train an image reconstruction model by using the blurred image to be trained and the event data to be trained, so as to obtain a target image reconstruction model.
In a fourth aspect, an image reconstruction apparatus is provided that includes means capable of implementing the method of the second aspect and any one of its implementations. For example, the apparatus may include an acquisition unit for acquiring a first blurred image and first event data; the processing unit is used for inputting the first blurred image and the first event data into the target image reconstruction model to obtain a multi-frame target high-definition image; and the output unit is used for outputting one or more frames in the multi-frame target high-definition image.
In a fifth aspect, a training apparatus for an image reconstruction model is provided, where the training apparatus includes a processor and a memory, the memory is used to store program instructions, and the processor is used to call the program instructions to implement the training method of the first aspect and any implementation manner thereof.
In a sixth aspect, an image reconstruction apparatus is provided, comprising a processor and a memory, the memory being configured to store program instructions, and the processor being configured to invoke the program instructions to implement the method of the second aspect and any one of its implementations.
In a seventh aspect, a computer-readable storage medium is provided, wherein the computer-readable storage medium stores a program code for execution by a device, and the program code includes instructions for implementing the training method of the first aspect and any one of its implementations or implementing the method of the second aspect and any one of its implementations.
In an eighth aspect, a chip system is provided, where the chip system includes a processor and a data interface, and the processor reads an instruction stored in a memory through the data interface to implement a training method according to the first aspect and any one implementation manner thereof or to implement a method according to the second aspect and any one implementation manner thereof.
A ninth aspect provides a computer program product for causing a computer to perform the training method of the first aspect and any one of its implementations or the method of the second aspect and any one of its implementations when the computer program is executed on the computer.
Drawings
Fig. 1 is a schematic structural diagram of a system architecture according to an embodiment of the present application.
Fig. 2 is a schematic diagram of a convolutional neural network of an embodiment of the present application.
Fig. 3 is a schematic diagram of a chip hardware structure according to an embodiment of the present application.
Fig. 4 is a schematic view of an application scenario according to an embodiment of the present application.
Fig. 5 is a schematic flow chart of a training method of an image reconstruction model according to an embodiment of the present application.
Fig. 6 is a schematic flow chart of a training process of an image reconstruction model according to an embodiment of the present application.
Fig. 7 is a schematic structural diagram of a training architecture of an image reconstruction model according to an embodiment of the present application.
Fig. 8 is a schematic diagram of an image reconstruction method according to an embodiment of the present application.
Fig. 9 is a schematic flowchart of a training process of an image reconstruction model according to an embodiment of the present application.
Fig. 10 is a schematic diagram of an image reconstruction apparatus according to an embodiment of the present application.
Fig. 11 is a schematic diagram of a training apparatus for an image reconstruction model according to an embodiment of the present application.
Fig. 12 is a schematic diagram of a hardware configuration of an image reconstruction apparatus according to an embodiment of the present application.
Fig. 13 is a hardware configuration diagram of a training apparatus for an image reconstruction model according to an embodiment of the present application.
Detailed Description
The technical solution in the present application will be described below with reference to the accompanying drawings.
The image reconstruction scheme provided by the embodiment of the application can be applied to photographing, video recording, smart cities, human-computer interaction and other scenes which need image restoration (image reconstruction) or are called as image reconstruction.
It should be understood that the image in the embodiment of the present application may be a still image (or referred to as a still picture) or a dynamic image (or referred to as a dynamic picture), for example, the image in the present application may be a video or a dynamic picture, or the image in the present application may also be a still picture or a photo. For convenience of description, the present application collectively refers to a still image or a moving image as an image in the following embodiments.
Two scenes to which the image reconstruction method of the embodiment of the present application can be specifically applied are briefly described below.
Scene one, automatic driving
With the development of the era and the pursuit of high-quality life by human beings, automatic driving is slowly becoming the leading trend of the current automobile development. And a sensor for sensing the scene is required to be mounted for safe driving when automatic driving is realized, wherein the visual sensor is responsible for visual sensing and understanding of the scene. However, most of the image data acquired by the vision sensor during high-speed driving of the vehicle contains motion blur, which is not conducive to understanding the scene, even traffic accidents occur. The event camera senses the world according to a principle completely different from that of a traditional intensity camera, has the characteristic of sensing brightness change at high time resolution, and can reconstruct (recover) a high-definition video sequence with high quality by adopting the image reconstruction scheme of the embodiment of the application, so that the understanding capability of a control unit in an automatic driving scene is improved.
Scene two, security monitoring
Similar to the scene, in the security monitoring scene, there may be image blur caused by high-speed motion with large coverage rate, such as fast moving cars, running pedestrians, and the like, and these image blur are not favorable for human eyes to judge the image content, especially in the case frame breakage and other scenes. Therefore, by matching with the event camera information and adopting the image reconstruction scheme of the embodiment of the application, a high-definition video sequence with higher quality in the monitored scene can be reconstructed, so that the image quality of the monitored scene is improved.
The embodiments of the present application relate to a large number of related applications of neural networks, and in order to better understand the scheme of the embodiments of the present application, the following first introduces related terms and concepts of neural networks that may be related to the embodiments of the present application.
(1) Neural network
The neural network may be composed of neural units, which may be referred to as xsAnd an arithmetic unit with intercept 1 as input, the output of which can be shown as equation (1):
Figure BDA0002960992300000051
wherein s is 1, 2, … … n, n is a natural number greater than 1, and W issIs xsB is the bias of the neural unit. f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to convert an input signal in the neural unit into an output signal. The output signal of the activation function may be used as an input for the next convolutional layer, and the activation function may be a sigmoid function. A neural network is a network formed by connecting together a plurality of the above-mentioned single neural units, i.e. the output of one neural unit may be the input of another neural unit. The input of each neural unit can be connected with a local receiving domain of the previous layer to extract the characteristics of the local receiving domain, and the local receiving domain can be a region composed of a plurality of neural units.
(2) Convolutional neural network
A Convolutional Neural Network (CNN) is a deep neural network with a convolutional structure. The convolutional neural network comprises a feature extractor consisting of convolutional layers and sub-sampling layers, which can be regarded as a filter. The convolutional layer is a neuron layer for performing convolutional processing on an input signal in a convolutional neural network. In the convolutional layer of the convolutional neural network, one neuron may be connected to only a part of the neighbor neurons. In a convolutional layer, there are usually several characteristic planes, and each characteristic plane may be composed of several neural units arranged in a rectangular shape. The neural units of the same feature plane share weights, where the shared weights are convolution kernels. Sharing weights may be understood as the way image information is extracted is location independent. The convolution kernel can be initialized in the form of a matrix of random size, and can obtain reasonable weight through learning in the training process of the convolutional neural network. In addition, sharing weights brings the immediate benefit of reducing connections between layers of the convolutional neural network, while reducing the risk of overfitting.
(3) Deep neural network
Deep Neural Networks (DNNs), also called multi-layer neural networks, can be understood as neural networks with multiple hidden layers. The DNNs are divided according to the positions of different layers, and the neural networks inside the DNNs can be divided into three categories: input layer, hidden layer, output layer. Generally, the first layer is an input layer, the last layer is an output layer, and the middle layers are hidden layers. The layers are all connected, that is, any neuron of the ith layer is necessarily connected with any neuron of the (i + 1) th layer.
Although DNN appears complex, it is not really complex in terms of the work of each layer, simply the following linear relational expression:
Figure BDA0002960992300000052
wherein,
Figure BDA0002960992300000053
is the input vector of the input vector,
Figure BDA0002960992300000054
is the output vector of the output vector,
Figure BDA0002960992300000055
is the offset vector, W is the weight matrix (also called coefficient), and α () is the activation function. Each layer is only for the input vector
Figure BDA0002960992300000056
Obtaining the output vector through such simple operation
Figure BDA0002960992300000057
Due to the large number of DNN layers, the coefficient W and the offset vector
Figure BDA0002960992300000058
The number of the same is also large. The definition of these parameters in DNN is as follows: taking coefficient W as an example: assume that in a three-layer DNN, the linear coefficients of the 4 th neuron of the second layer to the 2 nd neuron of the third layer are defined as
Figure BDA0002960992300000059
The superscript 3 represents the number of layers in which the coefficient W is located, while the subscripts correspond to the third layer index 2 of the output and the second layer index 4 of the input.
In summary, the coefficients from the kth neuron at layer L-1 to the jth neuron at layer L are defined as
Figure BDA00029609923000000510
Note that the input layer is without the W parameter. In deep neural networks, more hidden layers make the network more able to depict complex situations in the real world. Theoretically, the more parameters the higher the model complexity, the larger the "capacity", which means that it can accomplish more complex learning tasks. The final goal of the process of training the deep neural network, i.e., learning the weight matrix, is to obtain the weight matrix (the weight matrix formed by the vectors W of many layers) of all the layers of the deep neural network that is trained.
(4) Loss function
In the process of training the deep neural network, because the output of the deep neural network is expected to be as close to the value really expected to be predicted as possible, the weight vector of each layer of the neural network can be updated according to the difference between the predicted value of the current network and the really expected target value (of course, an initialization process is usually carried out before the first updating, namely parameters are preset for each layer in the deep neural network), for example, if the predicted value of the network is high, the weight vector is adjusted to be lower, and the adjustment is continuously carried out until the deep neural network can predict the really expected target value or the value which is very close to the really expected target value. Therefore, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which are loss functions (loss functions) or objective functions (objective functions), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, if the higher the output value (loss) of the loss function indicates the larger the difference, the training of the deep neural network becomes the process of reducing the loss as much as possible.
(5) Back propagation algorithm
The neural network can adopt a Back Propagation (BP) algorithm to correct the size of parameters in the initial neural network model in the training process, so that the reconstruction error loss of the neural network model is smaller and smaller. Specifically, the error loss is generated by transmitting the input signal forward until the output, and the parameters in the initial neural network model are updated by back-propagating the error loss information, so that the error loss is converged. The back propagation algorithm is a back propagation motion with error loss as a dominant factor, aiming at obtaining the optimal parameters of the neural network model, such as a weight matrix.
Fig. 1 is a schematic structural diagram of a system architecture according to an embodiment of the present application. In fig. 1, a data acquisition device 160 is used to acquire training data. For the image reconstruction method according to the embodiment of the present application, the training data may include a blurred image and event data corresponding to the blurred image.
After the training data is collected, data collection device 160 stores the training data in database 130, and training device 120 trains target model/rule 101 based on the training data maintained in database 130.
The following describes that the training device 120 obtains the target model/rule 101 based on the training data, and the training device 120 processes the input image until the definition of the image output by the training device 120 is smaller than a certain threshold, thereby completing the training of the target model/rule 101.
The target model/rule 101 can be used for implementing the image reconstruction method of the embodiment of the application, that is, the image to be processed is input into the target model/rule 101 after being subjected to relevant preprocessing, and then the output image after image reconstruction can be obtained. The target model/rule 101 in the embodiment of the present application may be specifically an image reconstruction apparatus in the embodiment of the present application. It should be noted that, in practical applications, the training data maintained in the database 130 may not necessarily all come from the acquisition of the data acquisition device 160, and may also be received from other devices. It should be noted that, the training device 120 does not necessarily perform the training of the target model/rule 101 based on the training data maintained by the database 130, and may also obtain the training data from the cloud end or other places for performing the model training.
The target model/rule 101 obtained by training according to the training device 120 may be applied to different systems or devices, for example, the execution device 110 shown in fig. 1, where the execution device 110 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an Augmented Reality (AR)/Virtual Reality (VR), a vehicle-mounted terminal, or a server or a cloud device. In fig. 1, the execution device 110 configures an input/output (I/O) interface 112 for data interaction with an external device, and a user may input data to the I/O interface 112 through the client device 140, where the input data may include: the image to be processed is input by the client device.
The preprocessing module 113 and the preprocessing module 114 are configured to perform preprocessing according to input data (such as an image to be processed) received by the I/O interface 112, and in this embodiment of the application, the preprocessing module 113 and the preprocessing module 114 may not be provided (or only one of the preprocessing modules may be provided), and the computing module 111 is directly used to process the input data.
In the process of preprocessing the input data by the execution device 110 or performing the relevant processing such as calculation by the calculation module 111 of the execution device 110, the execution device 110 may call the data, the code, and the like in the data storage system 150 for the corresponding processing, and may store the data, the instruction, and the like obtained by the corresponding processing into the data storage system 150.
Finally, the I/O interface 112 returns the processing result, such as the output image reconstructed from the image obtained as described above, to the client device 140, thereby providing it to the user.
It should be noted that the training device 120 may generate corresponding target models/rules 101 for different targets or different tasks based on different training data, and the corresponding target models/rules 101 may be used to achieve the targets or complete the tasks, so as to provide the user with the required results.
In the case shown in fig. 1, the user may manually give the input data, which may be operated through an interface provided by the I/O interface 112. Alternatively, the client device 140 may automatically send the input data to the I/O interface 112, and if the client device 140 is required to automatically send the input data to obtain authorization from the user, the user may set the corresponding permissions in the client device 140. The user can view the result output by the execution device 110 at the client device 140, and the specific presentation form can be display, sound, action, and the like. The client device 140 may also serve as a data collection terminal, collecting input data of the input I/O interface 112 and output results of the output I/O interface 112 as new sample data, as shown, and storing the new sample data in the database 130. Of course, the input data inputted to the I/O interface 112 and the output result outputted from the I/O interface 112 as shown in the figure may be directly stored in the database 130 as new sample data by the I/O interface 112 without being collected by the client device 140.
It should be noted that fig. 1 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the position relationship between the devices, modules, and the like shown in the diagram does not constitute any limitation, for example, in fig. 1, the data storage system 150 is an external memory with respect to the execution device 110, and in other cases, the data storage system 150 may also be disposed in the execution device 110.
As shown in fig. 1, a target model/rule 101 is obtained by training according to a training device 120, where the target model/rule 101 may be an image reconstruction apparatus in this application in this embodiment, and specifically, the image reconstruction apparatus provided in this embodiment may include one or more neural networks, where the one or more neural networks may include CNN, Deep Convolutional Neural Networks (DCNN), and the like.
Since CNN is a very common neural network, the structure of CNN will be described in detail below with reference to fig. 2. As described in the introduction of the basic concept, the convolutional neural network is a deep neural network with a convolutional structure, and is a deep learning (deep learning) architecture, and the deep learning architecture refers to learning at multiple levels of abstraction through a machine learning algorithm. As a deep learning architecture, CNN is a feed-forward artificial neural network in which individual neurons can respond to images input thereto.
As shown in fig. 2, Convolutional Neural Network (CNN)200 may include an input layer 210, a convolutional/pooling layer 220 (where pooling is optional), and a neural network layer 230. The relevant contents of these layers are described in detail below.
Convolutional layer/pooling layer 220:
and (3) rolling layers:
the convolutional layer/pooling layer 220 shown in fig. 2 may include layers such as example 221 and 226, for example: in one implementation, 221 is a convolutional layer, 222 is a pooling layer, 223 is a convolutional layer, 224 is a pooling layer, 225 is a convolutional layer, and 226 is a pooling layer; in another implementation 221, 222 are convolutional layers, 223 is a pooling layer, 224, 225 are convolutional layers, and 226 is a pooling layer. I.e., the output of a convolutional layer may be used as input to a subsequent pooling layer, or may be used as input to another convolutional layer to continue the convolution operation.
The inner working principle of a convolutional layer will be described below by taking convolutional layer 221 as an example.
Convolution layer 221 may include a number of convolution operators, also called kernels, whose role in image processing is to act as a filter to extract specific information from the input image matrix, and the convolution operator may be essentially a weight matrix, which is usually predefined, and during the convolution operation on the image, the weight matrix is usually processed pixel by pixel (or two pixels by two pixels … … depending on the value of the step length stride) in the horizontal direction on the input image, so as to complete the task of extracting specific features from the image. The size of the weight matrix should be related to the size of the image, and it should be noted that the depth dimension (depth dimension) of the weight matrix is the same as the depth dimension of the input image, and the weight matrix extends to the entire depth of the input image during the convolution operation. Thus, convolving with a single weight matrix will produce a single depth dimension of the convolved output, but in most cases not a single weight matrix is used, but a plurality of weight matrices of the same size (row by column), i.e. a plurality of matrices of the same type, are applied. The outputs of each weight matrix are stacked to form the depth dimension of the convolved image, where the dimension is understood to be determined by "plurality" as described above. Different weight matrices may be used to extract different features in the image, e.g., one weight matrix to extract image edge information, another weight matrix to extract a particular color of the image, yet another weight matrix to blur unwanted noise in the image, etc. The plurality of weight matrices have the same size (row × column), the feature maps extracted by the plurality of weight matrices having the same size also have the same size, and the extracted feature maps having the same size are combined to form the output of the convolution operation.
The weight values in these weight matrices need to be obtained through a large amount of training in practical application, and each weight matrix formed by the trained weight values can be used to extract information from the input image, so that the convolutional neural network 200 can make correct prediction.
When convolutional neural network 200 has multiple convolutional layers, the initial convolutional layer (e.g., 221) tends to extract more general features, which may also be referred to as low-level features; as the depth of convolutional neural network 200 increases, the features extracted by the convolutional layers (e.g., 226) further down are more complex, such as features with high levels of semantics, and the more highly semantic features are more suitable for the problem to be solved.
Pooling layer/pooling layer 220:
since it is often desirable to reduce the number of training parameters, it is often desirable to periodically introduce pooling layers after the convolutional layer, where the layers 221-226, as illustrated by 220 in fig. 2, may be one convolutional layer followed by one pooling layer, or multiple convolutional layers followed by one or more pooling layers. During image processing, the only purpose of the pooling layer is to reduce the spatial size of the image. The pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to a smaller sized image. The average pooling operator may calculate pixel values in the image over a certain range to produce an average as a result of the average pooling. The max pooling operator may take the pixel with the largest value in a particular range as the result of the max pooling. In addition, just as the size of the weighting matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size. The size of the image output after the processing by the pooling layer may be smaller than the size of the image input to the pooling layer, and each pixel point in the image output by the pooling layer represents an average value or a maximum value of a corresponding sub-region of the image input to the pooling layer.
The neural network layer 230:
after processing by convolutional layer/pooling layer 220, convolutional neural network 200 is not sufficient to output the required output information. Because, as previously described, the convolutional layer/pooling layer 220 only extracts features and reduces the parameters brought by the input image. However, to generate the final output information (required class information or other relevant information), the convolutional neural network 200 needs to generate one or a set of the required number of classes of output using the neural network layer 230. Therefore, a plurality of hidden layers (231, 232 to 23n shown in fig. 2) and an output layer 240 may be included in the neural network layer 230, and parameters included in the plurality of hidden layers may be obtained by pre-training according to related training data of a specific task type, for example, the task type may include image reconstruction, and the like.
After the hidden layers in the neural network layer 230, i.e. the last layer of the whole convolutional neural network 200 is the output layer 240, the output layer 240 has a loss function similar to the classification cross entropy, and is specifically used for calculating the prediction error, once the forward propagation (i.e. the propagation from the direction 210 to 240 in fig. 2 is the forward propagation) of the whole convolutional neural network 200 is completed, the backward propagation (i.e. the propagation from the direction 240 to 210 in fig. 2 is the backward propagation) starts to update the weight values and the bias of the aforementioned layers, so as to reduce the loss of the convolutional neural network 200, and the error between the result output by the convolutional neural network 200 through the output layer and the ideal result.
It should be noted that the convolutional neural network 200 shown in fig. 2 is only an example of a convolutional neural network, and in a specific application, the convolutional neural network may also exist in the form of other network models.
In this application, the image reconstruction model may include the convolutional neural network 200 shown in fig. 2, and the image reconstruction model may perform image reconstruction processing on the image to be processed to obtain a reconstructed image.
Fig. 3 is a hardware structure of a chip according to an embodiment of the present application, where the chip includes a neural network processor 30. The chip may be provided in the execution device 110 as shown in fig. 1 to complete the calculation work of the calculation module 111. The chip may also be disposed in the training apparatus 120 as shown in fig. 1 to complete the training work of the training apparatus 120 and output the target model/rule 101. The algorithms for the various layers in the convolutional neural network shown in fig. 2 can all be implemented in a chip as shown in fig. 3.
The neural network processor NPU 30 is mounted as a coprocessor on a main CPU (host CPU), which allocates tasks. The core portion of the NPU is an arithmetic circuit 303, and the controller 304 controls the arithmetic circuit 303 to extract data in a memory (weight memory or input memory) and perform an operation.
In some implementations, the arithmetic circuit 303 includes a plurality of processing units (PEs) therein. In some implementations, the operational circuitry 303 is a two-dimensional systolic array. The arithmetic circuit 303 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 303 is a general-purpose matrix processor.
For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit 303 fetches the data corresponding to the matrix B from the weight memory 302 and buffers the data on each PE in the arithmetic circuit 303. The arithmetic circuit 303 takes the matrix a data from the input memory 301 and performs matrix arithmetic with the matrix B, and stores a partial result or a final result of the obtained matrix in an accumulator (accumulator) 308.
The vector calculation unit 307 may further process the output of the operation circuit 303, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like. For example, the vector calculation unit 307 may be used for network calculation of non-convolution/non-FC layers in a neural network, such as pooling (pooling), batch normalization (batch normalization), local response normalization (local response normalization), and the like.
In some implementations, the vector calculation unit 307 can store the processed output vector to the unified buffer 306. For example, vector calculation unit 307 may apply a non-linear function to the output of operational circuitry 303, such as a vector of accumulated values, to generate the activation values. In some implementations, the vector calculation unit 307 generates a normalized value, a merged value, or both. In some implementations, the vector of processed outputs can be used as activation inputs to the arithmetic circuitry 303, for example, for use in subsequent layers in a neural network.
The unified memory 306 is used to store input data as well as output data.
The weight data directly passes through a memory unit access controller 305 (DMAC) to carry input data in the external memory to the input memory 301 and/or the unified memory 306, store the weight data in the external memory into the weight memory 302, and store data in the unified memory 306 into the external memory.
A Bus Interface Unit (BIU) 310, configured to implement interaction between the main CPU, the DMAC, and the instruction fetch memory 309 through a bus.
An instruction fetch buffer (issue fetch buffer)309 connected to the controller 304, for storing instructions used by the controller 304;
the controller 304 is configured to call the instruction cached in the finger memory 309, so as to control the operation process of the operation accelerator.
Generally, the unified memory 306, the input memory 301, the weight memory 302 and the instruction fetch memory 309 are On-Chip (On-Chip) memories, the external memory is a memory outside the NPU, and the external memory may be a double data rate synchronous dynamic random access memory (DDR SDRAM), a High Bandwidth Memory (HBM) or other readable and writable memories.
The operation of each layer in the convolutional neural network shown in fig. 2 may be performed by the operation circuit 303 or the vector calculation unit 307.
The executing device 110 in fig. 1 described above can execute the steps of the image reconstruction method according to the embodiment of the present application, and the CNN model shown in fig. 2 and the chip shown in fig. 3 can also be used to execute the steps of the image reconstruction method according to the embodiment of the present application. The image reconstruction method according to the embodiment of the present application is described in detail below with reference to the drawings.
The image reconstruction method provided by the embodiment of the application can be executed on a server, can also be executed on a cloud terminal, and can also be executed on a terminal device. Taking a terminal device as an example, as shown in fig. 4, the technical solution of the embodiment of the present application may be applied to a terminal device, and the image reconstruction method in the embodiment of the present application may perform image reconstruction processing on an input image to obtain an output image of the input image after image reconstruction. As shown in fig. 4, after the blurred image and the event data are input, at least one frame of the target reconstructed image (target high-definition image) may be output through the processing of the terminal device. The terminal device may be mobile or fixed, for example, the terminal device may be a mobile phone with an image reconstruction function, a Tablet Personal Computer (TPC), a media player, a smart tv, a Laptop Computer (LC), a Personal Digital Assistant (PDA), a Personal Computer (PC), a camera, a camcorder, a smart watch, a Wearable Device (WD), an autonomous vehicle, or the like, which is not limited in the embodiments of the present application.
In the prior art, when image reconstruction is performed, event data and a blurred image are combined, modeling is performed according to a motion blur process, an EDI (event-based double integer) model is the most common at present, in the prior art, a blurred image is reconstructed into a multi-frame high-definition image only by means of event data, and if a plurality of event points exist in exposure time, a frame number high-definition image corresponding to the number of event points needs to be recovered. This presents a problem in that noise accumulation occurs using the EDI model because the time data is very noisy and the event-triggered thresholds are not constant. It can be understood that as a frame image is reconstructed, relatively small errors are accumulated and enlarged, so that the sequence of the restored high-definition image frames deviates from the real data of the original blurred image. In addition, the EDI model itself has the problems of high complexity and low computational efficiency.
In view of the above problems, an embodiment of the present application provides an image reconstruction method, where a restored high-definition image is subjected to re-blurring processing, and then an image obtained after blurring processing is compared with an original input blurred image, which is equivalent to adding a constraint condition, so that a sequence of reconstructed high-definition image frames is closer to real data of the original input blurred image, and reconstruction quality and reconstruction effect are improved. Corresponding to more features being mined from the event data and the blurred image, thereby improving the quality of the reconstruction.
Fig. 5 is a schematic flowchart of a training method of an image reconstruction model according to an embodiment of the present disclosure, where the training method may be performed by an apparatus or device capable of performing training of the image reconstruction model, for example, the method may be performed by a server or a cloud device with greater computing power.
501. And acquiring a fuzzy image to be trained and event data to be trained.
The fuzzy image to be trained is an average value of the first high-definition images of the plurality of frames, and the event data to be trained comprises event points in a time period corresponding to the fuzzy image to be trained.
That is to say, the blurred image to be trained may be obtained by averaging existing continuous multi-frame high-definition images, so as to obtain a frame of synthesized blurred image. Therefore, in the training process, the image reconstruction model can be trained by comparing the reconstructed high-definition image with the real high-definition image.
The time period corresponding to the blurred image to be trained can be determined by the time corresponding to the existing continuous multiple frames of high-definition images (first high-definition images). The period simulates the exposure time of the camera or the camera during actual shooting, that is, the period simulating the exposure time causes blurring due to the action of the photographer, and a frame of blurred image is generated, wherein the frame of blurred image corresponds to a high-definition image frame sequence. For example, assuming that 6 frames of continuous high-definition images between times T0 and T1 are averaged to obtain a blurred image B1, the time period corresponding to the blurred image B1 is T0 to T1, and the 6 frames of continuous high-definition images are the multi-frame first high-definition images.
The event point can also be called as an event, and the most basic principle of the event camera is to output an event point after the brightness change of a certain pixel is accumulated to reach a trigger condition (the change reaches a certain degree). An event point can therefore be understood as an expression of an event: at what time (timestamp), which pixel point (pixel coordinate), an increase or decrease in brightness (brightness change) occurs.
502. And training the image reconstruction model by using the fuzzy image to be trained and the event data to be trained.
The image reconstruction model can be understood as a reconstruction network, and when the image reconstruction model is trained, the model can be trained by directly utilizing the data, or some other parameters can be introduced to train the model.
In the training process, parameters of the image reconstruction model can be updated according to constraint conditions to obtain a target image remodeling model, the constraint conditions include that the difference value between a to-be-trained blurred image and a re-blurred image is minimum, the re-blurred image can be obtained by processing a plurality of frames of second high-definition images, and the plurality of frames of second high-definition images are obtained by processing the to-be-trained blurred image and the to-be-trained event data by using the image reconstruction model.
That is to say, the image reconstruction model is mainly enabled to perform the re-blurring processing on the reconstructed high-definition image (second high-definition image), and then the image (re-blurred image) obtained after the blurring processing is compared with the originally input blurred image (blurred image to be trained), which is equivalent to increase the constraint condition, so that the high-definition image frame sequence reconstructed by using the trained image reconstruction model is closer to the real data of the originally input blurred image, and the reconstruction quality and the reconstruction effect are improved. Corresponding to more features being mined from the event data and the blurred image, thereby improving the quality of the reconstruction. This constraint condition may be referred to as a self-consistency constraint.
The method for performing the deblurring processing on the plurality of frames of the second high-definition image (i.e., the method for acquiring the deblurred image) may be various, for example, the method may be to directly perform the processing on the plurality of frames of the second high-definition image, or may be to perform the processing by combining the optical flow estimation result and the second high-definition image.
The direct processing of the multiple frames of the second high-definition images may be averaging the multiple frames of the second high-definition images, that is, the above-mentioned re-blurred image is obtained by averaging the multiple frames of the second high-definition images. I.e. the second high-definition image is averaged directly, without taking the optical flow estimation result into account, in such a way that a re-blurred image can be obtained, but the accuracy of the obtained re-blurred image is slightly lower than in the case of applying the optical flow estimation result.
The combination of the optical flow estimation result and the second high-definition image is equivalent to processing the optical flow estimation result and the second high-definition image, that is, the re-blurred image is obtained by processing the optical flow estimation result and the multi-frame second high-definition image, the optical flow estimation result is obtained by processing the to-be-trained blurred image and the to-be-trained event data by using an optical flow estimation network, and the optical flow estimation result includes the motion information of the to-be-trained blurred image. For example, the above-mentioned re-blurred image may be obtained by inserting an interpolation frame between at least two adjacent frames of the plurality of frames of the second high-definition images using the optical flow estimation result, and then averaging the interpolation frame and the plurality of frames of the second high-definition images. The interpolation frame can be inserted between every two adjacent frames, or the interpolation frame can be inserted between only part of the adjacent frames, and the accuracy of the re-blurred image obtained by the interpolation frame is higher. The effect of frame interpolation can be understood as enabling the interval between two adjacent frames of second high definition images to be more refined. The implementation mode can enable the optical flow estimation result to more accurately simulate the actual motion track, so as to obtain a re-blurred image closer to the actual motion track, but the number of the interpolation frames can be set as required.
The embodiment of the application mainly improves the reconstruction quality of the trained model by adding the constraint condition, the reconstruction quality is realized by adding the self-consistency constraint, and in order to achieve a better effect, other constraint conditions can be further introduced.
Other constraints such as a luminance consistency constraint may be further introduced in some implementations.
Optionally, the constraint condition further includes minimizing a difference value between the plurality of frames of optical flow estimation images and the plurality of frames of second high-definition images, where the plurality of frames of optical flow estimation images are obtained by performing a warping transformation on the plurality of frames of second high-definition images using optical flow estimation results, where an i-th frame of optical flow estimation images in the plurality of frames of optical flow estimation images is obtained by performing a warping transformation on an i + 1-th frame of second high-definition images in the plurality of frames of second high-definition images, and i is an integer greater than or equal to 0 and less than the number of frames of the plurality of frames of first high-definition images. This approach is equivalent to adding a further constraint, which may be referred to as a luminance consistency constraint, i.e. the luminance between adjacent frames should be consistent. Therefore, it can be seen that the reconstructed image further retains the information of the original blurred image as much as possible in the time dimension, and therefore, the quality and effect of image reconstruction can be further improved. Specifically, the motion is continuous and sequential, and in the prior art, the fact that the multi-frame images are sequentially kept consistent in time sequence is not considered, so that the definition of the reconstructed high-definition image is still low and the quality of the reconstructed high-definition image is poor.
It should be noted that the difference value between the multiple frames of optical flow estimation images and the multiple frames of second high-definition images may be a difference value between an i-th frame of optical flow estimation images in the multiple frames of optical flow estimation images and an i-th frame of second high-definition images in the multiple frames of second high-definition images, may also be a difference value between an average value of the multiple frames of optical flow estimation images and an average value of the multiple frames of second high-definition images, and may also be a difference value between an accumulated value of the multiple frames of optical flow estimation images and an accumulated value of the multiple frames of second high-definition images.
For another example, constraint conditions, reconstruction loss constraint and smoothing constraint, which are common in the prior art, may also be introduced, and the reconstruction loss may be understood as a difference between an output of the reconstruction network and a true value corresponding to the blurred image. The real value may be the first high definition image described above. The smoothing constraint is to consider the distance change between adjacent pixels in the single-frame image, that is, the pixels in the single-frame image are not suddenly changed, so as to improve the quality of image reconstruction.
Fig. 6 is a schematic flowchart of a training process of an image reconstruction model according to an embodiment of the present application, and fig. 6 mainly introduces actions performed by respective modules in the image reconstruction model in a training process, which can be regarded as a description of another angle of the training method shown in fig. 5.
601. And acquiring a to-be-trained blurred image and event data to be trained, wherein the event data to be trained comprises event points in a time period corresponding to the to-be-trained blurred image.
In some implementations, the blurred image to be trained and the event data to be trained may both be from an event camera, the event camera includes two outputs, one output is a high-definition image, and the other is the event data to be trained, so that the event camera may be used to collect the high-definition image and the event data to be trained, and then synthesize the high-definition image into the blurred image to be trained
In other implementation manners, the blurred image to be trained and the event data to be trained may also be obtained in the following manner, that is, a high-definition image is obtained by using a conventional camera or a camera, and then the high-definition image is input into simulation software to obtain corresponding simulation event data, so that the high-definition image and the event data to be trained can also be obtained at the same time. And averaging the high-definition images to synthesize the blurred image to be trained. The simulation software can be understood as a virtual event camera.
In addition, the blurred image to be trained and the event data to be trained may be acquired in real time or may be read from a storage device.
602. And inputting the fuzzy image to be trained and the event data to be trained into an optical flow estimation network to obtain an optical flow estimation result, wherein the optical flow estimation result comprises the motion information of a plurality of frames of first high-definition images corresponding to the fuzzy image to be trained.
Optical flow (optical flow) refers to the apparent motion of the image luminance pattern. The optical flow expresses the change of the image, and since it contains information on the movement of the object, it can be used by the observer to determine the movement of the object. The optical flow definition can extend the optical flow field, which is a two-dimensional instantaneous velocity field formed by all pixel points in an image, wherein the two-dimensional velocity vector is the projection of the three-dimensional velocity vector of a visible point in a scene on an imaging surface. The optical flow contains not only motion information of the observed object but also rich information about the three-dimensional structure of the scene. In the embodiment of the present application, the motion information of the blurred image is mined by using the optical flow. As can be understood from the blurred images seen in real life, in one blurred image, for example, an arm rotates by a certain angle, so that the imaging of the arm part is a few blurred images of the arm, if the blurred images are separated into a few frames of images, the motion trajectory of the arm can be resolved, and the optical flow is that the relative position of the arm coordinate changes between every two frames of images of the several frames of images along with the change of the motion trajectory.
It should be noted that the first high-definition image refers to a frame sequence of high-definition images corresponding to the blurred image to be trained, and thus may correspond to one or more frames of the first high-definition image. As mentioned above, the blurred image to be trained is the average value of the high definition frames within the exposure time, and it can be considered herein that the first high definition image is the above high definition image used to obtain the blurred image to be trained.
The optical flow estimation network may be a neural network that performs optical flow estimation, for example, an Ev-FlowNet network may be employed.
603. And inputting the fuzzy image to be trained and the event data to be trained into a reconstruction network to obtain a plurality of frames of second high-definition images.
Step 603 is similar to the operation in the prior art, that is, the blurred image is reconstructed by using the event data to be trained, so as to obtain a multi-frame high-definition image. It should be understood that step 603 and step 602 may be executed simultaneously or not, and there is no limitation to execute sequentially.
Step 603 may be performed with a reconstruction network, which refers to a neural network that can reconstruct the blurred image into a high-definition image.
604. And obtaining a re-blurred image by utilizing the motion information of the plurality of frames of the first high-definition images and the plurality of frames of the second high-definition images.
Step 604 and step 605 are equivalent to adding a constraint condition on the basis of the prior art, that is, the reconstructed high-definition image can be kept consistent with the information of the original blurred image to the maximum extent possible. The constraint is constrained to the motion information and the reconstructed second high-definition image, that is, whether the motion information has an error or the reconstruction network has an error, the error influence of the motion information and the reconstruction network can be minimized through the steps 604 and 605. I.e., the self-consistency constraint described above.
The method may be to process the result of the optical flow estimation (i.e. the motion information of the multiple frames of the first high-definition images) and the multiple frames of the second high-definition images, so as to obtain a re-blurred image; or in the case of obtaining a plurality of frames of the third high-definition images (i.e., in the case of performing step 606 and step 607), the optical flow estimation result and the plurality of frames of the third high-definition images are processed, thereby obtaining a re-blurred image.
605. And minimizing the difference value between the blurred image to be trained and the re-blurred image to obtain a multi-frame target high-definition image.
Optionally, a norm of the blurred image to be trained and the re-blurred image may be constructed as a loss function, and the target high-definition image may be obtained by minimizing the loss function.
The steps 604 and 605 may be performed by a constraint module (unit).
The constraint may be implemented by setting a loss function.
According to the training method of FIG. 6, the obtained target image reconstruction model can keep the information of the reconstructed high-definition image consistent with that of the original blurred image to the maximum extent, so that the image reconstruction quality and the image reconstruction effect are effectively improved.
As described above, steps 604 and 605 correspond to adding one constraint, and other constraints may be added according to similar logic. For example, the training method of fig. 6 may further include the operations of step 606 and step 607.
606. And performing distortion transformation on the multi-frame second high-definition images by using the optical flow estimation result to obtain multi-frame optical flow estimation images, wherein the i-th frame optical flow estimation image of the multi-frame optical flow estimation images is obtained by performing distortion transformation on the i + 1-th frame second high-definition images in the multi-frame second high-definition images, and i is an integer which is greater than or equal to 0 and less than the number of frames of the multi-frame first high-definition images.
Warping transformation (warp) may also be referred to as affine transformation, or may be understood as shifting the position of things in an image. For example, if 5 frames of second high-definition images are provided, warp the 1 st frame of second high-definition images, it can be regarded as an image after optical flow warp of the 1 st frame of second high-definition images, that is, the 1 st frame of optical flow estimated images, is obtained by adding motion information corresponding to optical flow between the 1 st frame of first high-definition images and the 2 nd frame of first high-definition images included in the optical flow to the 1 st frame of images.
607. And minimizing the difference value of the optical flow estimation image of the ith frame and the second high-definition image of the ith frame.
Steps 606 and 607 correspond to adding a constraint that the luminance between adjacent frames should be consistent. Therefore, the steps 606 and 607 can be regarded as that the reconstructed image retains the information of the original blurred image as much as possible from the time dimension, and therefore, the quality and effect of image reconstruction can be further improved. Specifically, the motion is continuous and sequential, and in the prior art, it is not considered that the multi-frame images are sequentially consistent in time sequence, so that the definition of the reconstructed high-definition image is still low and the quality is poor, but in the scheme of the embodiment of the present application, the information of the time dimension can be mined by introducing step 606 and step 607, so that the image processing effect is better, and the image reconstruction quality is higher. I.e., the brightness uniformity constraint described above.
Steps 606 and 607 may be performed during, before or after the performance of steps 604 and 605, and steps 606 and 607 may not be performed, except that in case steps 606 and 607 are performed, constraints in the time dimension may be increased and reconstruction quality may be better.
For example, assuming that steps 606 and 607 are performed during the execution of steps 604 and 605, steps 606 and 607 may be performed first, and then a plurality of frames of third high-definition images may be obtained, so that the above-mentioned re-blurred image may be obtained according to the optical flow estimation result and the plurality of frames of third high-definition images when step 604 is performed.
For another example, if steps 606 and 607 are executed after steps 604 and 605 are executed, replacing the second high-definition image of steps 606 and 607 with the target high-definition image may be equivalent to performing further processing on the obtained target high-definition image.
The target high-definition image can also be obtained by constructing a norm of the luminance values of the two frames of images as a loss function and minimizing the loss function.
The above two constraints are not considered in the prior art, and adding the above constraints can effectively improve the reconstruction quality, but it should be understood that other constraints already existing in the prior art, such as the reconstruction loss constraint and the smoothness constraint, can also be added in the implementation process, and are not repeated. In addition, since these two loss functions are available, those skilled in the art can review the data for self-use, and therefore will not be described.
For example, the addition of the smoothing constraint may be that, when step 605 is executed, the distance between adjacent pixels in each frame of the optical flow estimation image of the multiple frames of optical flow estimation images is minimized to obtain multiple frames of processed optical flow estimation images; the step is a corresponding step of the smoothing constraint, that is, the difference between the optical flow estimated image processed by the ith frame and the second high-definition image of the ith frame can be minimized by obtaining the processed optical flow estimated image through the smoothing constraint.
Optionally, the above constraints may be weighted, that is, a weight is set for each constraint while the above constraints are utilized, so as to further improve the quality of image reconstruction.
Fig. 7 is a schematic structural diagram of a training architecture of an image reconstruction model according to an embodiment of the present application. As shown in FIG. 7, the training architecture 700 includes an optical flow estimation network 710, a reconstruction network 720 and a constraint module 730. The blurred image to be trained and the event data to be trained may be input into the training architecture 700, and after training, a trained reconstruction network 720 'may be obtained, where the trained reconstruction network 720' is the target image reconstruction model. Therefore, the training architecture shown in fig. 7 can be used to perform the training method shown in fig. 5 or fig. 6, and the reconstruction network 720 obtained by the training architecture shown in fig. 7 can be used to perform the image reconstruction method shown in fig. 8.
When the trained reconstruction network 720 'is used for image reconstruction, for example, the first blurred image and the first event data shown in fig. 8 may be input to the trained reconstruction network 720', so that a multi-frame target high-definition image may be obtained.
The optical flow estimation network 710 is a neural network that can process the blurred image to obtain the optical flow estimation result, and may be, for example, the Ev-FlowNet network described above.
The reconstruction network 720 is a neural network that can process the blurred image, thereby obtaining a reconstructed high-definition image thereof. It can be seen that the reconstruction network 720 is the image reconstruction model described above, that is, the function of the training architecture shown in fig. 7 is to update the reconstruction network 720 through training, so as to obtain a trained reconstruction network 720' (i.e., a target image reconstruction model), and the target image reconstruction model may also be referred to as a target reconstruction network.
The constraint module 730 can be understood as a module that updates the parameters of the image reconstruction model according to the constraint conditions, so that the quality of the reconstructed image is guaranteed, and the constraint module 730 can perform parameter constraint by using the output of the optical flow estimation network 710 and/or the output of the reconstruction network 720, thereby obtaining a better trained reconstruction network 720'.
The above constraints may include a self-consistency constraint, may also include a brightness consistency constraint, and may also include a reconstruction loss constraint and a smoothing constraint.
As is known from the above, the image reconstruction model is the reconstruction network 720, and the components of the other training architecture in fig. 7 can be used to obtain a better target image reconstruction model only in the training phase, but when the target reconstruction model is applied to the image reconstruction process (which can be understood as the inference phase), the optical flow estimation network 710 and the constraint module 730 can be no longer used, in short, the optical flow estimation network 710 and the constraint module 730 only participate in the execution of partial steps in the training phase, and do not participate in the execution of steps in the inference phase.
It should be noted that fig. 7 only shows an example that the training architecture 700 includes the above three parts at the same time, but it should be understood that the training architecture 700 may also have other structural forms, for example, the training architecture 700 may include only the reconstruction network 720, that is, the fuzzy image to be trained and the event data to be trained are directly used to train the reconstruction network 720 to obtain a trained reconstruction network 720'; as another example, the training architecture 700 may include a reconstruction network 720 and a constraint module 730, the constraint module 730 being configured to update parameters of the reconstruction network 720; for another example, the optical-flow estimation result is directly read from the storage device, in which case the optical-flow estimation network 710 is no longer needed, but a module for reading the optical-flow estimation result from the storage device is needed, and so on, and will not be described one by one.
Fig. 8 is a schematic flowchart of an image reconstruction method according to an embodiment of the present application. The steps of fig. 8 are described below.
801. The method comprises the steps of acquiring a first blurred image and first event data, wherein the first event data comprises event points in a time period corresponding to the first blurred image.
The first blurred image may be understood as a blurred image to be processed, and the time period corresponding to the first blurred image may be a time period between the blurred image and a next frame image or an exposure time of the first blurred image. It can be understood that several frames of blurred images appear in a piece of video, which affects the viewing experience, so the time period between two frames of blurred images can be used as the exposure time when processing, so that the whole video is smoother. If only one frame of independent blurred image exists and the adjacent frames are high-definition images, the time period between the blurred image and the next frame of high-definition image can be used as the corresponding time period.
802. And inputting the first blurred image and the first event data into a target image reconstruction model to obtain a multi-frame target high-definition image.
The target image reconstruction model is obtained by using any one of the training methods of the image reconstruction model in the embodiments of the present application, and the structure of the target image reconstruction model may refer to the above description, and therefore, is not repeated for brevity.
803. And outputting one or more frames in the multi-frame target high-definition image.
That is to say, the output reconstructed image may be one or more frames taken from the multiple frames of the target high-definition image and output, for example, the middle frame, the last frame, or the odd frame, the even frame, and so on may be output.
After the target reconstruction model is trained, the number of frames of the target high-definition images which can be reconstructed is fixed, so that the first event data can be segmented to better match each frame of output images, and the number of the segments of the first event data corresponds to the number of frames of the multi-frame target high-definition images.
For example, it is assumed that the number of frames of the multi-frame target high-definition image obtained by the target image reconstruction model is 5, that is, the target image reconstruction model can reconstruct a blurred image into a 5-frame high-definition image. The first event data may be divided into 5 segments, and the length of the corresponding time period of each segment may be the same. Of course, the time period may be divided into an integral multiple of 5, such as 10, and then the event points of each two adjacent time periods are taken to form one time period, and the first event data may still be divided into 5 time periods. The above numerical values are for example only and for ease of understanding, and no limitation is present.
Fig. 9 is a schematic flowchart of a training process of an image reconstruction model according to an embodiment of the present application. Fig. 9 may be regarded as a specific example of the training method shown in fig. 5 or fig. 6. Fig. 9 is described below.
As shown in fig. 9, the original input image B may be regarded as an example of the blurred image to be trained, E0-E5 may be regarded as an example of the event data to be trained, and E0-E5 are represented by cubes, each of which includes a plurality of points, which may be regarded as the event points.
The Optical Flow estimation network and the Reconstruction network are represented by trapezoids in fig. 9, a plurality of trapezoids represented by Optical Flow in fig. 9 are an example of the Optical Flow estimation network, and one trapezoid represented by Reconstruction in fig. 9 is an example of the Reconstruction network. The optical flow estimation network processes the input image B and the event data E0-E5 to obtain flow u0-1、flow u1-2、flow u2-3、flow u3-4、flow u4-5And flow u5-6These are examples of the optical flow estimation result. The reconstruction network processes the input image B and the event data E0-E5 to obtain I0-I6,I0-I6This is an example of the multi-frame second high-definition image.
The corner cut rectangle denoted warp in fig. 9 is an example of the portion of the constraint module that performs the warp transform. As shown in fig. 9, using flow u0-1To I1Obtained by warp
Figure BDA0002960992300000171
For the example of the 0 th frame optical flow estimation image described above, flow u is used1-2To I2Obtained by warp
Figure BDA0002960992300000172
An example of estimating an image for optical flow for frame 1 above, with such a push, with flow u5-6To I6Obtained by warp
Figure BDA0002960992300000173
An example of an image is estimated for the optical flow of the 5 th frame described above.
Also shown in FIG. 9 is the utilization of flow u0-1To flow u5-6And I0-I6Performing re-blurring to obtain
Figure BDA0002960992300000174
Which is an example of the above-described re-blurred image.
As shown in FIG. 9, the self-consistency constraint is for image B and the re-blurred image
Figure BDA0002960992300000175
The operation is carried out. The brightness uniformity constraint is for I0And
Figure BDA0002960992300000176
middle, I1And
Figure BDA0002960992300000177
… … I between5And
Figure BDA0002960992300000178
operated in between. The reconstruction loss constraint is then on the true value G0-G6And I0-I6Respectively, are operated in between. While the smoothing constraint operates on a single image, not shown in fig. 9.
First, for a conventional camera to accumulate light in a scene to render an image of objects in the scene, a certain exposure time is required, and motion blur occurs in the exposure time when the objects move rapidly. The process of blurring may satisfy the following equation (2), that is, blurred image b (x) is the integration result of all high definition images within the exposure time.
Figure BDA0002960992300000179
Wherein T represents an exposure time; t is t0Represents the starting time of the exposure time; can be understood as the moment of triggering the photograph, It(x) All high definition images within the exposure time are indicated, and can be understood as the first high definition image of the plurality of frames.
Based on a plurality of frames of second high-definition images I generated by a reconstruction network by utilizing a physical process generated by motion blur0-I6And optical flow estimation result flow u generated by optical flow estimation network0-1To flow u5-6The optical flow estimation result and the multiple frames of second high-definition images may be processed, or only the multiple frames of second high-definition images may be processed to obtain a re-blurred image.
In one implementation, the re-blurred image is obtained by averaging a plurality of frames of the second high-definition image, that is, directly averaging the second high-definition image without considering the optical flow estimation result, but the accuracy of the obtained re-blurred image is slightly lower than that of the case of applying the optical flow estimation result.
In another implementation mode, the re-blurring of the image is to insert an interpolation frame between every two adjacent frames of the plurality of frames of second high-definition images by using the optical flow estimation result, and then average the interpolation frames and the plurality of frames of second high-definition images. However, it should be understood that in this implementation, an interpolation frame may be inserted between every two adjacent frames, or an interpolation frame may be inserted only between some of the adjacent frames, but the accuracy of the re-blurred image obtained by the interpolation frame is higher. The effect of frame interpolation can be understood as enabling the interval between two adjacent frames of second high definition images to be more refined. The realization mode can enable the optical flow estimation result to more accurately simulate the actual motion trail, so as to obtain the re-blurred image closer to the actual motion trail, but the number of the interpolation frames can be set as required.
In this implementation, the process of obtaining a re-blurred image using the optical flow estimation result and the plurality of frames of the second high-definition images may satisfy the following equation (3).
Figure BDA00029609923000001710
Figure BDA00029609923000001711
Wherein, Ii,j(x) And reconstructing a jth interpolated frame between two adjacent frames of images, wherein the interpolated frame can be understood as image data corresponding to an optical flow estimation result between two frames of first high-definition images, n-1 is the number of the insertable frames, and τ is the number of frames of the multiple frames of first high-definition images.
That is, a parameter n is introduced, n-1 is the number of frames that can be inserted between two frames of the second high definition images, and the number of frames can be determined according to practical situations, for example, 10 frames can be inserted, that is, j can be an integer ranging from 1 to 10, and n equals to 11. And obtaining 10(n-1) interpolation frames by utilizing the optical flow estimation result, and then averaging the interpolation frames and the multi-frame second high-definition images to obtain the re-blurred image.
After obtaining the re-blurred image, blurred images B (x) and
Figure BDA0002960992300000181
the self-consistency constraint of (a), which may satisfy the following equation (5).
Figure BDA0002960992300000182
LselfWhich can be understood as a self-consistency constraint loss function.
In order to ensure that the reconstructed high-definition image sequence has order in the time dimension, a plurality of frames of the second high-definition image may be subjected to distortion transformation by using the optical flow estimation result to obtain a plurality of frames of the optical flow estimation image, for example, the optical flow u is usedi-i+1Coming warp ith frame second high definition image IiFrame image, I frame second high definition image IiFrame image and ith frame optical flow estimation
Figure BDA0002960992300000183
The relationship between the images may satisfy the following equation (6).
Figure BDA0002960992300000184
Thus estimating images based on the assumption of brightness consistency, i.e. the optical flow of the ith frame
Figure BDA0002960992300000185
Second high definition image I corresponding to ith frameiThere is a luminance uniformity therebetween, and the luminance uniformity constraint may satisfy the following equation (7).
Figure BDA0002960992300000186
LwarpI.e. the difference between said luminance values, can be understood as a luminance consistency constraint loss function.
Alternatively, the overall loss function of the image reconstruction model may be constructed using the above-described constraint conditions, and the overall loss function may satisfy the following equation (8).
L=αLrecon+βLself+γLwarp+δLsmooth, (8)
Wherein L isreconFor reconstructing the loss function of the loss constraint, LsmoothFor the loss function of the smoothing constraint, α, β, γ, δ are weighting coefficients.
In order to verify the scheme of the embodiment of the present application, two common test sets, namely GoPro and Adobe240, are selected, and a method for image reconstruction by using an EDI model in the prior art (referred to as "the existing method" for short) and a method for image reconstruction provided by the embodiment of the present application (referred to as "the method" for short) are respectively used for testing, and the results are shown in table 1. The peak signal to noise ratio (PSNR) and the Structural Similarity Index (SSIM) in table 1 are two common parameters for comparing the performance of image processing results, and it can be seen from table 1 that in two data sets, the performance of the method is better than that of the existing method, no matter for the reconstruction result of the intermediate frame or the reconstruction result of the entire frame sequence.
TABLE 1 comparison of image reconstruction results of the prior art method and the present method
Figure BDA0002960992300000187
Fig. 10 is a schematic diagram of an image reconstruction apparatus according to an embodiment of the present application. As shown in fig. 10, the apparatus 1000 includes an obtaining unit 1001, a processing unit 1002, and an output unit 1003, and can be used to perform the image reconstruction method according to the embodiment of the present application.
For example, the acquiring unit 1001 may perform the above step 801, the processing unit 1002 may perform the above step 802, and the outputting unit 1003 is configured to perform the step 803. The processing unit 1002 may include the above-described target image reconstruction model for performing step 802.
Fig. 11 is a schematic diagram of a training apparatus for an image reconstruction model according to an embodiment of the present application. As shown in fig. 11, the apparatus 2000 includes an obtaining unit 2001 and a training unit 2002, which can be used to execute the training method of the image reconstruction model according to the embodiment of the present application. For example, the acquisition unit 2001 may perform the above step 501, and the training unit 2002 may perform the above step 502. For another example, the obtaining unit 2001 may perform the step 601, the training unit 1002 may perform the steps 602-605, and the training unit may be further configured to perform the steps 606-607.
Fig. 12 is a schematic diagram of a hardware configuration of an image reconstruction apparatus according to an embodiment of the present application. The image reconstruction apparatus 3000 shown in fig. 12 includes a memory 3001, a processor 3002, a communication interface 3003, and a bus 3004. The memory 3001, the processor 3002, and the communication interface 3003 are communicatively connected to each other via a bus 3004.
The memory 3001 may be a Read Only Memory (ROM), a static memory device, a dynamic memory device, or a Random Access Memory (RAM). The memory 3001 may store a program, and the processor 3002 and the communication interface 3003 are used to perform the steps of the image reconstruction apparatus according to the embodiment of the present application when the program stored in the memory 3001 is executed by the processor 3002.
The processor 3002 may be a general-purpose Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), a Graphics Processing Unit (GPU), or one or more integrated circuits, and is configured to execute related programs to implement the functions required to be executed by the units in the image reconstruction apparatus according to the embodiment of the present disclosure, or to execute the image reconstruction method according to the embodiment of the present disclosure.
The processor 3002 may also be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the image reconstruction method according to the embodiment of the present application may be implemented by integrated logic circuits of hardware in the processor 3002 or by instructions in the form of software.
The processor 3002 may also be a general purpose processor, a Digital Signal Processor (DSP), an ASIC, an FPGA (field programmable gate array) or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or may be implemented by a combination of hardware and software modules in the decoding processor. The software modules may be located in ram, flash, rom, prom, eprom, eeprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 3001, and the processor 3002 reads information in the memory 3001, and in combination with hardware thereof, completes functions required to be executed by units included in the image reconstruction apparatus according to the embodiment of the present application, or executes the image reconstruction method according to the embodiment of the method of the present application.
Communication interface 3003 enables communication between device 3000 and other devices or communication networks using transceiver devices, such as, but not limited to, transceivers. For example, the image to be processed may be acquired through the communication interface 3003.
The bus 3004 may include a pathway to transfer information between various components of the apparatus 3000 (e.g., memory 3001, processor 3002, communication interface 3003).
Fig. 13 is a hardware configuration diagram of a training apparatus for an image reconstruction model according to an embodiment of the present application. Similar to the apparatus 3000 described above, the training apparatus 4000 shown in fig. 13 includes a memory 4001, a processor 4002, a communication interface 4003, and a bus 4004. The memory 4001, the processor 4002 and the communication interface 4003 are communicatively connected to each other via a bus 4004.
The memory 4001 may store a program, and the processor 4002 is configured to execute the steps of the training method of training the image reconstruction model according to the embodiment of the present application when the program stored in the memory 4001 is executed by the processor 4002.
The processor 4002 may be a general-purpose CPU, a microprocessor, an ASIC, a GPU, or one or more integrated circuits, and is configured to execute a relevant program to implement the training method for training the image reconstruction model according to the embodiment of the present application.
Processor 4002 may also be an integrated circuit chip having signal processing capabilities. In the training process, the steps of the training method for the image reconstruction model according to the embodiment of the present application may be implemented by integrated logic circuits of hardware in the processor 4002 or instructions in the form of software.
It should be understood that, by training the image reconstruction model by the training apparatus 4000 shown in fig. 13, the trained image reconstruction model can be used to execute the image reconstruction method according to the embodiment of the present application.
Specifically, the apparatus shown in fig. 13 may obtain training data and an image reconstruction model to be trained from the outside through the communication interface 4003, and then train the image reconstruction model to be trained by the processor according to the training data.
It should be noted that although the above-described apparatus 3000 and apparatus 4000 show only memories, processors, and communication interfaces, in particular implementations, those skilled in the art will appreciate that the apparatus 3000 and apparatus 4000 may also include other devices necessary for normal operation. Also, those skilled in the art will appreciate that apparatus 3000 and apparatus 4000 may also include hardware components for performing other additional functions, according to particular needs. Further, those skilled in the art will appreciate that apparatus 3000 and apparatus 4000 may also include only those components necessary to implement embodiments of the present application, and need not include all of the components shown in fig. 12 and 13.
It should be understood that the processor in the embodiments of the present application may be a Central Processing Unit (CPU), and the processor may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It will also be appreciated that the memory in the embodiments of the subject application can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The non-volatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, but not limitation, many forms of Random Access Memory (RAM) are available, such as Static RAM (SRAM), Dynamic Random Access Memory (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), synchlink DRAM (SLDRAM), and direct bus RAM (DR RAM).
The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. The procedures or functions described in accordance with the embodiments of the present application are generated in whole or in part when the computer instructions or the computer program are loaded or executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more collections of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.
It should be understood that the term "and/or" herein is merely a kind of association relationship describing an associated object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. In addition, the "/" in this document generally indicates a relationship that the contextual object is an "or," but may also indicate a "and/or" relationship, which can be understood with specific reference to the context.
In the present application, "at least one" means one or more, "a plurality" means two or more. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.
It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and all the changes or substitutions should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (25)

1. A training method of an image reconstruction model is characterized by comprising the following steps:
acquiring a fuzzy image to be trained and event data to be trained, wherein the fuzzy image to be trained is an average value of a plurality of frames of first high-definition images, and the event data to be trained comprises event points in a time period corresponding to the fuzzy image to be trained;
and in the training process, updating parameters of the image reconstruction model according to constraint conditions to obtain the target image reconstruction model, wherein the constraint conditions comprise that the difference value between the blurred image to be trained and a re-blurred image is minimum, the re-blurred image is obtained by processing a plurality of frames of second high-definition images, and the plurality of frames of second high-definition images are obtained by processing the blurred image to be trained and the event data to be trained by using the image reconstruction model.
2. The method of claim 1, wherein the re-blurred image is averaged over the plurality of frames of the second high definition image.
3. The method of claim 1, wherein the re-blurred image is obtained by processing an optical flow estimation result and the plurality of frames of second high-definition images, the optical flow estimation result is obtained by processing the blurred image to be trained and the event data to be trained by using an optical flow estimation network, and the optical flow estimation result comprises motion information of the blurred image to be trained.
4. The method of claim 3, wherein the re-blurred image is obtained by interpolating an interpolated frame between at least two adjacent frames of the plurality of frames of the second high definition images using the optical flow estimation result, and averaging the interpolated frame and the plurality of frames of the second high definition images.
5. The method of claim 3 or 4, wherein the constraint condition further includes minimizing a difference value between an i-th frame optical flow estimation image in a plurality of frames of optical flow estimation images and an i-th frame second high-definition image in the plurality of frames of second high-definition images, the plurality of frames of optical flow estimation images being obtained by performing a warp transformation on the plurality of frames of second high-definition images by using the optical flow estimation result, wherein the i-th frame optical flow estimation image in the plurality of frames of optical flow estimation images is obtained by performing a warp transformation on an i + 1-th frame second high-definition image in the plurality of frames of second high-definition images, and i is an integer greater than or equal to 0 and less than the number of frames of the plurality of frames of first high-definition images.
6. The method of claim 5, wherein the difference between the ith frame of optical flow estimated image in the plurality of frames of optical flow estimated images and the ith frame of second high definition image in the plurality of frames of second high definition images satisfies the following formula:
Figure FDA0002960992290000011
wherein L iswarpFor said difference, τ represents the number of frames of said plurality of frames of the first high definition image, Ii(x) Representing the ith frame of the second high definition image,
Figure FDA0002960992290000012
representing the i-th frame optical flow estimation image.
7. The method of claim 5 or 6, wherein the constraint further comprises minimizing a distance between adjacent pixels in each of the plurality of frames of optical flow estimation images.
8. The method of any of claims 1 to 7, wherein the constraint further comprises minimizing a difference between the plurality of frames of the second high definition image and the plurality of frames of the first high definition image.
9. An image reconstruction method, comprising:
acquiring a first blurred image and first event data, wherein the first event data comprises event points in a time period corresponding to the first blurred image;
inputting the first blurred image and the first incident data into a target image reconstruction model to obtain a multi-frame target high-definition image, wherein the target image reconstruction model is obtained by using the method of any one of claims 1 to 8;
and outputting one or more frames of the multi-frame target high-definition images.
10. The method of claim 9, wherein the method further comprises:
and segmenting the first event data, wherein the segment number of the first event data corresponds to the frame number of the multi-frame target high-definition image.
11. An apparatus for training an image reconstruction model, comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a fuzzy image to be trained and event data to be trained, the fuzzy image to be trained is an average value of a plurality of frames of first high-definition images, and the event data comprises event points in a time period corresponding to the fuzzy image to be trained;
the training unit is used for training an image reconstruction model by using the to-be-trained blurred image and the to-be-trained event data to obtain a target image reconstruction model, and in the training process, the training unit is used for updating parameters of the image reconstruction model according to a constraint condition to obtain the target image reconstruction model, wherein the constraint condition comprises that the difference value between the to-be-trained blurred image and a re-blurred image is minimum, the re-blurred image is obtained by processing a plurality of frames of second high-definition images, and the plurality of frames of second high-definition images are obtained by processing the to-be-trained blurred image and the to-be-trained event data by using the image reconstruction model.
12. The apparatus of claim 11, wherein the re-blurred image is averaged over the plurality of frames of the second high definition image.
13. The apparatus of claim 11, wherein the re-blurred image is obtained by processing an optical flow estimation result and the plurality of frames of second high-definition images, the optical flow estimation result is obtained by processing the blurred image to be trained and the event data to be trained by using an optical flow estimation network, and the optical flow estimation result includes motion information of the blurred image to be trained.
14. The apparatus of claim 13, wherein the re-blurred image is obtained by interpolating an interpolated frame between at least two adjacent frames of the plurality of frames of the second high definition images using the optical flow estimation result, and averaging the interpolated frame and the plurality of frames of the second high definition images.
15. The apparatus of claim 13 or 14, wherein the constraint condition further includes minimizing a difference between an i-th frame optical flow estimated image in a plurality of frames of optical flow estimated images and an i-th frame second high-definition image in the plurality of frames of second high-definition images, the plurality of frames of optical flow estimated images being obtained by warp-transforming the plurality of frames of second high-definition images using the optical flow estimation result, wherein the i-th frame optical flow estimated image in the plurality of frames of optical flow estimated images is obtained by warp-transforming an i + 1-th frame second high-definition image in the plurality of frames of second high-definition images, and i is an integer greater than or equal to 0 and less than the number of frames of the plurality of frames of first high-definition images.
16. The apparatus of claim 15, wherein a difference between an i-th frame of the plurality of frames of optical flow estimated images and an i-th frame of the second high definition images in the plurality of frames of second high definition images satisfies the following equation:
Figure FDA0002960992290000021
wherein L iswarpFor said difference, τ represents the number of frames of said plurality of frames of the first high definition image, Ii(x) Representing the ith frame of the second high definition image,
Figure FDA0002960992290000022
representing the i-th frame optical flow estimation image.
17. The apparatus of claim 15 or 16, wherein the constraint further comprises minimizing a distance between adjacent pixels in each of the plurality of frames of optical flow estimation images.
18. The apparatus of any of claims 11 to 17, wherein the constraint further comprises minimizing a difference between the plurality of frames of the second high definition image and the plurality of frames of the first high definition image.
19. An image reconstruction apparatus, comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a first blurred image and first event data, and the first event data comprises event points in a time period corresponding to the first blurred image;
a processing unit, configured to input the first blurred image and the first event data into a target image reconstruction model, which is obtained by using the apparatus according to any one of claims 11 to 18, to obtain a multi-frame target high-definition image;
and the output unit is used for outputting one or more frames in the multi-frame target high-definition image.
20. The apparatus as recited in claim 19, said processing unit to further:
and segmenting the first event data, wherein the segment number of the first event data corresponds to the frame number of the multi-frame target high-definition image.
21. An apparatus for training an image reconstruction model, comprising a processor and a memory, the memory for storing program instructions, the processor for invoking the program instructions to perform the method of any one of claims 1 to 8.
22. An image reconstruction apparatus comprising a processor and a memory, the memory for storing program instructions, the processor for invoking the program instructions to perform the method of claim 9 or 10.
23. A computer-readable storage medium, characterized in that the computer-readable medium stores program code for execution by a device, the program code comprising instructions for performing the method of any of claims 1 to 8 or 9 or 10.
24. A chip system, comprising a processor and a data interface, wherein the processor reads instructions stored on a memory through the data interface to perform the method according to any one of claims 1 to 8 or 9 or 10.
25. A computer program product, characterized in that the computer program, when executed on a computer, causes the computer to perform the method of any of claims 1 to 8 or of claim 9 or 10.
CN202110237930.9A 2021-03-04 2021-03-04 Training method of image reconstruction model, image reconstruction method and device thereof Active CN113076685B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110237930.9A CN113076685B (en) 2021-03-04 2021-03-04 Training method of image reconstruction model, image reconstruction method and device thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110237930.9A CN113076685B (en) 2021-03-04 2021-03-04 Training method of image reconstruction model, image reconstruction method and device thereof

Publications (2)

Publication Number Publication Date
CN113076685A true CN113076685A (en) 2021-07-06
CN113076685B CN113076685B (en) 2024-09-10

Family

ID=76609839

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110237930.9A Active CN113076685B (en) 2021-03-04 2021-03-04 Training method of image reconstruction model, image reconstruction method and device thereof

Country Status (1)

Country Link
CN (1) CN113076685B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688752A (en) * 2021-08-30 2021-11-23 厦门美图宜肤科技有限公司 Face pigment detection model training method, device, equipment and storage medium
CN113706414A (en) * 2021-08-26 2021-11-26 荣耀终端有限公司 Training method of video optimization model and electronic equipment
CN113744169A (en) * 2021-09-07 2021-12-03 讯飞智元信息科技有限公司 Image enhancement method and device, electronic equipment and storage medium
CN113837938A (en) * 2021-07-28 2021-12-24 北京大学 Super-resolution method for reconstructing potential image based on dynamic vision sensor
US20220036513A1 (en) * 2020-07-28 2022-02-03 Samsung Electronics Co., Ltd. System and method for generating bokeh image for dslr quality depth-of-field rendering and refinement and training method for the same
CN114461061A (en) * 2022-01-05 2022-05-10 东风柳州汽车有限公司 Vehicle display method, device, equipment and storage medium
CN114881866A (en) * 2022-03-23 2022-08-09 清华大学 Occlusion-removing 3D imaging method and device based on event data
WO2023042432A1 (en) * 2021-09-17 2023-03-23 ソニーセミコンダクタソリューションズ株式会社 Imaging system, processing device, and machine learning device
WO2024002211A1 (en) * 2022-06-30 2024-01-04 华为技术有限公司 Image processing method and related apparatus

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150116353A1 (en) * 2013-10-30 2015-04-30 Morpho, Inc. Image processing device, image processing method and recording medium
US20180068430A1 (en) * 2016-09-07 2018-03-08 Huazhong University Of Science And Technology Method and system for estimating blur kernel size
CN107945127A (en) * 2017-11-27 2018-04-20 南昌大学 A kind of High-motion picture deblurring method based on image column gray probability uniformity
CN110309856A (en) * 2019-05-30 2019-10-08 华为技术有限公司 Image classification method, the training method of neural network and device
CN111061895A (en) * 2019-07-12 2020-04-24 北京达佳互联信息技术有限公司 Image recommendation method and device, electronic equipment and storage medium
CN111667399A (en) * 2020-05-14 2020-09-15 华为技术有限公司 Method for training style migration model, method and device for video style migration
CN111667442A (en) * 2020-05-21 2020-09-15 武汉大学 High-quality high-frame-rate image reconstruction method based on event camera
CN111914997A (en) * 2020-06-30 2020-11-10 华为技术有限公司 Method for training neural network, image processing method and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150116353A1 (en) * 2013-10-30 2015-04-30 Morpho, Inc. Image processing device, image processing method and recording medium
US20180068430A1 (en) * 2016-09-07 2018-03-08 Huazhong University Of Science And Technology Method and system for estimating blur kernel size
CN107945127A (en) * 2017-11-27 2018-04-20 南昌大学 A kind of High-motion picture deblurring method based on image column gray probability uniformity
CN110309856A (en) * 2019-05-30 2019-10-08 华为技术有限公司 Image classification method, the training method of neural network and device
CN111061895A (en) * 2019-07-12 2020-04-24 北京达佳互联信息技术有限公司 Image recommendation method and device, electronic equipment and storage medium
CN111667399A (en) * 2020-05-14 2020-09-15 华为技术有限公司 Method for training style migration model, method and device for video style migration
CN111667442A (en) * 2020-05-21 2020-09-15 武汉大学 High-quality high-frame-rate image reconstruction method based on event camera
CN111914997A (en) * 2020-06-30 2020-11-10 华为技术有限公司 Method for training neural network, image processing method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
任静静;方贤勇;陈尚文;汪粼波;周健;: "基于快速卷积神经网络的图像去模糊", 计算机辅助设计与图形学学报, no. 08, pages 59 - 71 *
罗琪彬;蔡强;: "采用双框架生成对抗网络的图像运动模糊盲去除", 图学学报, no. 06, pages 77 - 84 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220036513A1 (en) * 2020-07-28 2022-02-03 Samsung Electronics Co., Ltd. System and method for generating bokeh image for dslr quality depth-of-field rendering and refinement and training method for the same
US11823353B2 (en) * 2020-07-28 2023-11-21 Samsung Electronics Co., Ltd. System and method for generating bokeh image for DSLR quality depth-of-field rendering and refinement and training method for the same
CN113837938A (en) * 2021-07-28 2021-12-24 北京大学 Super-resolution method for reconstructing potential image based on dynamic vision sensor
CN113837938B (en) * 2021-07-28 2022-09-09 北京大学 Super-resolution method for reconstructing potential image based on dynamic vision sensor
CN113706414A (en) * 2021-08-26 2021-11-26 荣耀终端有限公司 Training method of video optimization model and electronic equipment
CN113706414B (en) * 2021-08-26 2022-09-09 荣耀终端有限公司 Training method of video optimization model and electronic equipment
CN113688752A (en) * 2021-08-30 2021-11-23 厦门美图宜肤科技有限公司 Face pigment detection model training method, device, equipment and storage medium
CN113688752B (en) * 2021-08-30 2024-02-02 厦门美图宜肤科技有限公司 Training method, device, equipment and storage medium for face color detection model
CN113744169A (en) * 2021-09-07 2021-12-03 讯飞智元信息科技有限公司 Image enhancement method and device, electronic equipment and storage medium
WO2023042432A1 (en) * 2021-09-17 2023-03-23 ソニーセミコンダクタソリューションズ株式会社 Imaging system, processing device, and machine learning device
CN114461061B (en) * 2022-01-05 2023-12-15 东风柳州汽车有限公司 Vehicle display method, device, equipment and storage medium
CN114461061A (en) * 2022-01-05 2022-05-10 东风柳州汽车有限公司 Vehicle display method, device, equipment and storage medium
CN114881866A (en) * 2022-03-23 2022-08-09 清华大学 Occlusion-removing 3D imaging method and device based on event data
CN114881866B (en) * 2022-03-23 2024-08-16 清华大学 Event data-based de-occlusion 3D imaging method and device
WO2024002211A1 (en) * 2022-06-30 2024-01-04 华为技术有限公司 Image processing method and related apparatus

Also Published As

Publication number Publication date
CN113076685B (en) 2024-09-10

Similar Documents

Publication Publication Date Title
CN113076685B (en) Training method of image reconstruction model, image reconstruction method and device thereof
Baldwin et al. Time-ordered recent event (tore) volumes for event cameras
CN110532871B (en) Image processing method and device
CN111402130B (en) Data processing method and data processing device
WO2021043168A1 (en) Person re-identification network training method and person re-identification method and apparatus
CN115442515B (en) Image processing method and apparatus
CN109993707B (en) Image denoising method and device
CN111667399B (en) Training method of style migration model, video style migration method and device
WO2021043273A1 (en) Image enhancement method and apparatus
WO2021164731A1 (en) Image enhancement method and image enhancement apparatus
WO2021063341A1 (en) Image enhancement method and apparatus
CN112446380A (en) Image processing method and device
CN111914997B (en) Method for training neural network, image processing method and device
WO2022134971A1 (en) Noise reduction model training method and related apparatus
CN111797881B (en) Image classification method and device
CN111402146A (en) Image processing method and image processing apparatus
CN113011562A (en) Model training method and device
Kim et al. Event-guided deblurring of unknown exposure time videos
CN112446835B (en) Image restoration method, image restoration network training method, device and storage medium
WO2024002211A1 (en) Image processing method and related apparatus
CN113065645A (en) Twin attention network, image processing method and device
US11915383B2 (en) Methods and systems for high definition image manipulation with neural networks
US11741579B2 (en) Methods and systems for deblurring blurry images
Vitoria et al. Event-based image deblurring with dynamic motion awareness
Zhang et al. Unsupervised depth estimation from monocular videos with hybrid geometric-refined loss and contextual attention

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant