CN114841870A - Image processing method, related device and system - Google Patents
Image processing method, related device and system Download PDFInfo
- Publication number
- CN114841870A CN114841870A CN202210334494.1A CN202210334494A CN114841870A CN 114841870 A CN114841870 A CN 114841870A CN 202210334494 A CN202210334494 A CN 202210334494A CN 114841870 A CN114841870 A CN 114841870A
- Authority
- CN
- China
- Prior art keywords
- event stream
- image
- blurred image
- feature
- features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 21
- 238000000034 method Methods 0.000 claims abstract description 49
- 238000000605 extraction Methods 0.000 claims abstract description 43
- 230000008859 change Effects 0.000 claims abstract description 13
- 230000001960 triggered effect Effects 0.000 claims abstract description 10
- 238000012545 processing Methods 0.000 claims description 56
- 230000004927 fusion Effects 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 12
- 230000002452 interceptive effect Effects 0.000 claims description 7
- 230000001629 suppression Effects 0.000 abstract description 11
- 230000006870 function Effects 0.000 description 21
- 238000004891 communication Methods 0.000 description 14
- 238000010586 diagram Methods 0.000 description 12
- 230000008569 process Effects 0.000 description 11
- 230000009977 dual effect Effects 0.000 description 8
- 238000012549 training Methods 0.000 description 8
- 230000003287 optical effect Effects 0.000 description 6
- 238000005070 sampling Methods 0.000 description 6
- 238000013500 data storage Methods 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 239000000835 fiber Substances 0.000 description 3
- 238000003062 neural network model Methods 0.000 description 3
- 230000000295 complement effect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 235000019800 disodium phosphate Nutrition 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000037361 pathway Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G06T5/70—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
Abstract
The embodiment of the application provides an image processing method, a related device and a system. The method comprises the following steps: respectively extracting features of a blurred image and an event stream to obtain a first feature of the blurred image and a first feature of the event stream, wherein the event stream is triggered by scene brightness change within exposure time corresponding to the blurred image; performing depth feature extraction on the first features of the blurred image and the first features of the event stream to obtain second features of the blurred image and second features of the event stream; and obtaining a clear image of a target moment according to the second characteristic of the blurred image and the second characteristic of the event stream, wherein the target moment is any moment in the exposure time. By adopting the method, the suppression of event characteristic noise and the deblurring of image characteristics can be realized, and a clear image at any moment in the exposure time can be recovered.
Description
Technical Field
The present application relates to the field of image processing technologies, and in particular, to an image processing method, a related apparatus and a system.
Background
Motion deblurring is a traditional and important problem in the field of computer vision and photography. Motion blur is the aliasing of different spatial position information due to camera motion or scene changes during imaging, resulting in the loss of important high frequency information in the image. But motion blurred images typically contain complete information of the moving scene during the exposure time. Therefore, it is fully possible to recover the motion state of the object and the sharp object texture from the blurred image. On one hand, motion deblurring enables a restored image/video to be presented with a good visual effect, enables people to know useful information hidden in a blurred image, improves visual experience of a terminal user, and on the other hand, is beneficial to solving the problem that low-quality images are difficult to analyze, and has positive promoting effects on analyzing people and improving performances of other computer vision algorithms such as detection, tracking and the like.
In the prior art, a CNN network is designed, a single motion blur image and event data are used as input, and a plurality of clear video sequences are learned through supervision of synthesized data. In order to guide a network to perform semi-supervised training to improve the generalization capability of a model on real data, another CNN branch is designed, an event stream is used as input, and optical flow information is output as motion information of a scene. And a new blurred image is obtained by re-rendering based on the clear image and the motion information obtained by estimation through a physical blur forming process, and the newly obtained blurred image and the blurred image used as input establish a blur consistency cost function to guide the training of the CNN network in a self-supervision mode. In addition, a new clear image is formed by the estimated clear image through the transmission of motion information, and the new clear image and the clear image obtained through network estimation establish an automatic supervision cost function of photometric consistency to further guide the training of the network. In the process of establishing the two cost functions, the segmented linear motion model is used for estimating the high nonlinearity of the motion, so that the network outputs an accurate high-density motion flow.
However, it can only recover a clear image corresponding to a specific time point, which may lose important scene information at a certain time, thereby affecting the analysis and judgment of the whole scene. Moreover, the time-space noise of the event cannot be well suppressed, and the event noise can falsely guide the training of the network, so that the recovered image has noise points in places which are not blurred originally; in addition, noise can affect the estimation of the brightness change of the motion scene, so that the estimation of the motion by the network can be affected by wrong time stamps and the number of event points.
Disclosure of Invention
The application discloses an image processing method, a related device and a system, which can recover a clear image at any moment in exposure time and can inhibit the time-space noise of an event.
In a first aspect, an embodiment of the present application provides an image processing method, including: respectively extracting features of a blurred image and an event stream to obtain a first feature of the blurred image and a first feature of the event stream, wherein the event stream is triggered by scene brightness change within exposure time corresponding to the blurred image; performing depth feature extraction on the first features of the blurred image and the first features of the event stream to obtain second features of the blurred image and second features of the event stream; and obtaining a clear image of a target moment according to the second characteristic of the blurred image and the second characteristic of the event stream, wherein the target moment is any moment in the exposure time.
In the embodiment of the application, the first feature of the blurred image and the first feature of the event stream are obtained by respectively performing primary feature extraction on the blurred image and the event stream, and then the first feature of the blurred image and the first feature of the event stream are subjected to depth feature extraction to obtain the image feature after deblurring (namely, the second feature of the blurred image) and the event stream feature after event noise suppression (namely, the second feature of the event stream); and then obtaining a clear image at the target moment according to the second characteristic of the blurred image and the second characteristic of the event stream. By adopting the method, the suppression of the event characteristic noise and the deblurring of the image characteristic can be realized. By restoring the clear image at any moment in the exposure time, the whole motion scene corresponding to the blurred image can be known and analyzed more completely without missing any important moment information.
In one possible implementation manner, the performing depth feature extraction on the first feature of the blurred image and the first feature of the event stream to obtain the second feature of the blurred image and the second feature of the event stream includes: respectively obtaining a depth characteristic parameter of the blurred image and a depth characteristic parameter of the event stream according to the first characteristic of the blurred image and the first characteristic of the event stream; and carrying out interactive processing on the depth characteristic parameters of the blurred image and the depth characteristic parameters of the event stream to obtain a second characteristic of the blurred image and a second characteristic of the event stream.
According to the method and the device, the problem of image deblurring at any moment in the exposure time is solved by utilizing the ultrahigh time resolution of the event information in a complementary mode, and meanwhile, the noise in the event is restrained by utilizing the information smoothness of the blurred image, so that the scene dynamic in the exposure time is more accurately estimated.
In a possible implementation manner, the obtaining a sharp image at a target time according to the second feature of the blurred image and the second feature of the event stream, where the target time is any time within the exposure time includes: coding the target moment to obtain a time vector; obtaining image features of fusion time information according to the time vector, the second features of the blurred image and the second features of the event stream; and decoding the image characteristics of the fusion time information to obtain a clear image of the target moment.
The embodiment realizes the fusion of any continuous time information and depth characteristics by encoding continuous time signals; by adopting the MLP network structure of continuous time decoding to decode the clear image corresponding to any time in the exposure time from the depth feature of the fusion time information, the whole motion scene corresponding to the blurred image can be more completely understood and analyzed without missing any important time information.
In a second aspect, an embodiment of the present application provides an image processing apparatus, including: the first extraction module is used for respectively extracting features of a blurred image and an event stream to obtain a first feature of the blurred image and a first feature of the event stream, wherein the event stream is triggered by scene brightness change within exposure time corresponding to the blurred image; the second extraction module is used for performing depth feature extraction on the first features of the blurred image and the first features of the event stream to obtain second features of the blurred image and second features of the event stream; and the processing module is used for obtaining a clear image of a target moment according to the second characteristic of the blurred image and the second characteristic of the event stream, wherein the target moment is any moment in the exposure time.
In a possible implementation manner, the second extraction module is configured to: respectively obtaining a depth characteristic parameter of the blurred image and a depth characteristic parameter of the event stream according to the first characteristic of the blurred image and the first characteristic of the event stream; and carrying out interactive processing on the depth characteristic parameters of the blurred image and the depth characteristic parameters of the event stream to obtain a second characteristic of the blurred image and a second characteristic of the event stream.
In one possible implementation manner, the processing module is configured to: coding the target moment to obtain a time vector; obtaining image features of fusion time information according to the time vector, the second features of the blurred image and the second features of the event stream; and decoding the image characteristics of the fusion time information to obtain a clear image at the target moment.
In a third aspect, the present application provides an image processing apparatus comprising a processor and a memory; wherein the memory is configured to store program code, and the processor is configured to call the program code to perform the method as provided in any one of the possible embodiments of the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium comprising computer instructions that, when executed on an electronic device, cause the electronic device to perform the method as provided in any one of the possible embodiments of the first aspect.
In a fifth aspect, the embodiments of the present application provide a computer program product, which when run on a computer, causes the computer to execute the method as provided in any one of the possible embodiments of the first aspect.
It will be appreciated that the apparatus of the second aspect, the apparatus of the third aspect, the computer readable storage medium of the fourth aspect, or the computer program product of the fifth aspect provided above are all adapted to perform the method provided in any of the first aspects. Therefore, the beneficial effects achieved by the method can refer to the beneficial effects in the corresponding method, and are not described herein again.
Drawings
The drawings used in the embodiments of the present application are described below.
Fig. 1 is a schematic architecture diagram of an image processing system according to an embodiment of the present application;
fig. 2 is a schematic flowchart of an image processing method according to an embodiment of the present application;
FIG. 3a is a schematic flowchart of another image processing method provided in an embodiment of the present application;
FIG. 3b is a schematic diagram of a model process provided by an embodiment of the present application;
fig. 3c is a schematic diagram of a network structure of a dual event with an implicit structure according to an embodiment of the present application;
FIG. 4a is a schematic diagram of another image processing method provided in the embodiment of the present application;
fig. 4b is a schematic diagram of a network structure provided in the embodiment of the present application;
fig. 4c is a schematic diagram of another network structure provided in the embodiment of the present application;
fig. 5 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of another image processing apparatus according to an embodiment of the present application.
Detailed Description
The embodiments of the present application will be described below with reference to the drawings. The terminology used in the description of the embodiments herein is for the purpose of describing particular embodiments herein only and is not intended to be limiting of the application.
Fig. 1 is a schematic diagram illustrating an architecture of an image processing system according to an embodiment of the present disclosure. The system may include an image processing device and an autonomous vehicle/robot. The image processing device is used for processing the blurred image and an event stream triggered by scene brightness change in the exposure time corresponding to the blurred image to obtain a clear image corresponding to any moment in the exposure time. Clear video sequences can be obtained based on a plurality of corresponding clear images at any time, and further the clear video sequences can be processed in an automatic driving vehicle/robot so as to achieve the purposes of face recognition, semantic segmentation, object detection or 3D reconstruction and the like.
The image processing apparatus may be a server, an arbitrary terminal device, or the like, and may be located inside the autonomous vehicle or the robot, or connected to the outside of the autonomous vehicle or the robot, for example, to perform image processing.
The embodiment is described only by taking an automatically driven vehicle or a robot as an example, and the embodiment may also be other devices and the like, which is not particularly limited in this embodiment.
Fig. 2 is a schematic flow chart of an image processing method according to an embodiment of the present application. As shown in fig. 2, the method includes steps 201 and 203, which are as follows:
201. respectively extracting features of a blurred image and an event stream to obtain a first feature of the blurred image and a first feature of the event stream, wherein the event stream is triggered by scene brightness change within exposure time corresponding to the blurred image;
the blurred image can be understood as an image that is captured as a result of motion. For example, when an object is moving and the object is photographed at this time, a blurred image is acquired.
The feature extraction may be a preliminary process performed on the blurred image and the event stream, respectively, and may be, for example, a map feature.
In a possible implementation manner, the feature extraction is performed on the blurred image and the event stream respectively, and the blurred image and the event stream may be input into a preset neural network model for processing, so as to obtain a first feature of the blurred image and a first feature of the event stream. Other means can be adopted, and the scheme is not particularly limited in this respect.
202. Performing depth feature extraction on the first features of the blurred image and the first features of the event stream to obtain second features of the blurred image and second features of the event stream;
the second feature of the blurred image may be understood as a deblurred image feature obtained by processing the first feature of the blurred image.
The second feature of the event stream may be understood as an event stream feature obtained by processing the first feature of the event stream and having event noise suppressed.
In a possible implementation manner, the depth feature extraction may be performed on the first feature of the blurred image and the first feature of the event stream by inputting both the first feature of the blurred image and the first feature of the event stream into a preset neural network model for processing, so as to obtain the second feature of the blurred image and the second feature of the event stream. Other means can be adopted, and the scheme is not particularly limited in this respect.
203. And obtaining a clear image of a target moment according to the second characteristic of the blurred image and the second characteristic of the event stream, wherein the target moment is any moment in the exposure time.
In one possible implementation, based on the deblurred image feature (i.e., the second feature of the blurred image) and the event stream feature (i.e., the second feature of the event stream) obtained as described above, time information is added to the second feature of the blurred image and the second feature of the event stream, so that a sharp image at a target time is obtained from the feature having the time information.
In the embodiment of the application, the first feature of the blurred image and the first feature of the event stream are obtained by respectively performing primary feature extraction on the blurred image and the event stream, and then the first feature of the blurred image and the first feature of the event stream are subjected to depth feature extraction to obtain the image feature after deblurring (namely, the second feature of the blurred image) and the event stream feature after event noise suppression (namely, the second feature of the event stream); and then obtaining a clear image at the target moment according to the second characteristic of the blurred image and the second characteristic of the event stream. By adopting the method, the suppression of the event characteristic noise and the deblurring of the image characteristic can be realized. By restoring the clear image at any moment in the exposure time, the whole motion scene corresponding to the blurred image can be known and analyzed more completely without missing any important moment information. The scheme can recover more image details and has higher definition. The scheme can greatly promote the performance of other high-level computer vision algorithms.
Fig. 3a is a schematic flow chart of another image processing method according to the embodiment of the present application. As shown in fig. 3a, the method comprises steps 301 and 304, which are as follows:
301. respectively extracting features of a blurred image and an event stream to obtain a first feature of the blurred image and a first feature of the event stream, wherein the event stream is triggered by scene brightness change within exposure time corresponding to the blurred image;
the blurred image can be understood as an image that is captured as a result of motion. For example, when an object is moving and the object is photographed at this time, a blurred image is acquired.
The feature extraction may be a preliminary process performed on the blurred image and the event stream, respectively, and may be, for example, a map feature.
In a possible implementation manner, the feature extraction is performed on the blurred image and the event stream respectively, and the blurred image and the event stream may be input into a preset neural network model for processing, so as to obtain a first feature of the blurred image and a first feature of the event stream.
Specifically, as shown in fig. 3b, a processing diagram of a shallow feature extraction network model provided in the embodiment of the present application is shown. As shown in FIG. 3B, the shallow feature extraction network model SFE comprises a pixel scrambling reshf layer and two convolution layers, wherein the shallow feature B is obtained by inputting the observable blurred image B and the high-time-resolution event stream epsilon into the shallow feature extraction network model respectively, and the two signals pass through the two convolution layers respectively feat (i.e., first feature of blurred image) and E feat (i.e., the first feature of the event stream).
The shallow feature extraction network model can enable the network training of the transformer to be more stable when the second features of the fuzzy image and the second features of the event stream are obtained subsequently; meanwhile, the input with different scales can generate the features with the same scale, and the feature embedding of the subsequent transform is facilitated.
Other means can be adopted, and the scheme is not particularly limited in this respect.
302. Respectively obtaining a depth characteristic parameter of the blurred image and a depth characteristic parameter of the event stream according to the first characteristic of the blurred image and the first characteristic of the event stream;
in a possible implementation manner, as shown in fig. 3c, the embodiment of the present application further provides a network structure of dual attentions with an implicit structure. As shown in fig. 3c, this network structure enables two input signals to interact, thereby achieving suppression of event noise and deblurring of image features. The network structure is formed by stacking a plurality of DALS blocks, and each DALS block takes two kinds of characteristics extracted by a shallow layer characteristic extraction network or the output of the previous DALS as input.
Specifically, the network structure first aligns the first feature B of the blurred image feat And a first characteristic E of said event stream feat Respectively carrying out feature embedding through Residual Dense Blocks (RDB), and then coding out query (Q), Key (K) and value (V) required for calculating respective attribution through a linear layer. The attion of the two signals was calculated as follows:
wherein d is k Is the dimension of a single feature in Q, K, V, and W-MSA (Q, K, V) is the feature representation in the middle of the DALS block.
Based on the processing, the depth characteristic parameter of the blurred image and the depth characteristic parameter of the event stream can be obtained. This embodiment will be described by taking a depth feature parameter as an attention parameter.
It may also be other parameters obtained through processing by other network models, and this is not specifically limited in this embodiment.
303. Carrying out interactive processing on the depth characteristic parameters of the blurred image and the depth characteristic parameters of the event stream to obtain second characteristics of the blurred image and second characteristics of the event stream;
after obtaining the attention of each signal, the two signals are processed interactively in the following two ways.
1) As shown in fig. 3c, the event atttion suppresses noise by correction of the atttion of the blurred image, and then outputs a depth feature by MLP:
Attn E ←Attn E +Attn B
wherein Attn E And Attn B Attention, V-of the event signal and blurred image, respectively, is the value calculation operator, where E feat Refers to the depth feature (i.e., the second feature) of the event signal.
2) Depth feature B of image for removing blurring of image feature feat (i.e., the second feature) is calculated from the features of two signals:
the scheme solves the problem of image deblurring at any moment in the exposure time by utilizing the ultrahigh time resolution of the event information in a complementary mode, and simultaneously inhibits noise in the event by utilizing the information smoothness of the blurred image, thereby realizing more accurate estimation of scene dynamics in the exposure time.
Based on the above processing, the deblurred image feature (i.e., the second feature of the blurred image) and the event stream feature (i.e., the second feature of the event stream) with the event noise suppressed can be obtained.
304. And obtaining a clear image of a target moment according to the second characteristic of the blurred image and the second characteristic of the event stream, wherein the target moment is any moment in the exposure time.
In one possible implementation, based on the deblurred image feature (i.e., the second feature of the blurred image) and the event stream feature (i.e., the second feature of the event stream) obtained as described above, time information is added to the second feature of the blurred image and the second feature of the event stream, so that a sharp image at a target time is obtained from the feature having the time information.
In the embodiment of the application, the first feature of the blurred image and the first feature of the event stream are obtained by respectively performing preliminary feature extraction on the blurred image and the event stream, and then the first feature of the blurred image and the first feature of the event stream are input into a dual feature extraction model for performing depth feature extraction to obtain the image feature after deblurring (namely, the second feature of the blurred image) and the event stream feature after event noise suppression (namely, the second feature of the event stream); and then obtaining a clear image at the target moment according to the second characteristic of the blurred image and the second characteristic of the event stream. By adopting the method, the suppression of the event characteristic noise and the deblurring of the image characteristic can be realized. By recovering the clear image at any moment in the exposure time, the whole motion scene corresponding to the blurred image can be known and analyzed more completely without missing any important moment information.
Fig. 4a is a schematic diagram illustrating another image processing method according to an embodiment of the present application. The method comprises steps 401-406, which are as follows:
401. respectively extracting features of a blurred image and an event stream to obtain a first feature of the blurred image and a first feature of the event stream, wherein the event stream is triggered by scene brightness change within exposure time corresponding to the blurred image;
for the description of this part, refer to fig. 4b and the description of the foregoing embodiments, which are not repeated herein.
402. Respectively obtaining a depth characteristic parameter of the blurred image and a depth characteristic parameter of the event stream according to the first characteristic of the blurred image and the first characteristic of the event stream;
dual feature extraction network model f as in FIG. 4a γ Extracting the network model f by dual features γ And extracting depth features for recovering a clear image and a motion state of a scene in the exposure time from the input observable blurred image B and the event stream epsilon.
For the description of this part, reference may be made to fig. 4b and the description of the foregoing embodiment, which are not repeated herein.
403. Carrying out interactive processing on the depth characteristic parameters of the blurred image and the depth characteristic parameters of the event stream to obtain second characteristics of the blurred image and second characteristics of the event stream;
for the description of this part, reference may be made to the foregoing embodiments, which are not described in detail herein.
Based on the above processing, the deblurred image feature (i.e., the second feature of the blurred image) and the event stream feature (i.e., the second feature of the event stream) with the event noise suppressed can be obtained.
404. Coding the target moment to obtain a time vector;
as shown in figure 4a of the drawings,the method is a clear image decoding network fusing continuous time information, and the clear image of a scene corresponding to the moment t is decoded by inputting the extracted depth features and the corresponding time t (or the coding information of the t) into the clear image decoding network fusing the continuous time information, wherein the t is the time information of any moment in an exposure event. In the network training process, the clear images obtained by the network also have time sequence information corresponding to t along with the continuous change of the continuous time t, so that the images of the corresponding clear scenes can be output according to the input any time t (within the exposure time) in the network reasoning stage.
The processing method corresponding to fig. 4a is as follows:
I(t)=B+φ θ (t;f γ (B,ε τ ))
where x, p represent the location and polarity of the event trigger, respectively.
In one possible implementation, as shown in fig. 4c, a network structure for decoding a sharp image that merges continuous time information is provided in the embodiment of the present application. As shown in fig. 4c, after the depth features extracted by DALS are processed by a plurality of convolutional layers, the encoded time signal is merged into the depth features output by the convolutional layers, and then a clear image at time t is decoded by one MLP.
In one possible implementation, the exposure time is first normalized. For example, if the exposure time is 0.1s, the normalization process is performed so that the normalized time becomes 1. Based on this operation, the time after the arbitrary time normalization processing within the exposure time can be obtained.
Then, the target time t can be encoded into a time vector by using a Fourier encoding method. The time vector may be a multidimensional time vector, for example, a 2L-dimensional vector, where L may be any positive integer, and this scheme is not particularly limited in this respect.
For example, the multi-dimensional time vector may be represented as:
n(t)=(cos(2 0 πt),sin(2 0 πt),…,cos(2 L-1 πt),sin(2 L-1 πt))。
405. obtaining image features of fusion time information according to the time vector, the second features of the blurred image and the second features of the event stream;
as shown in fig. 4c, the encoded time vector, the obtained deblurred image feature (i.e., the second feature of the blurred image) and the event stream feature (i.e., the second feature of the event stream) after the event noise suppression are concatenated, so as to implement the pixel-by-pixel concatenation of the time information into the above features.
406. And decoding the image characteristics of the fusion time information to obtain a clear image of the target moment.
And decoding the clear image at the moment t by using a pixel-by-pixel MLP network structure to obtain the clear image at the moment.
In one possible implementation, a plurality of sharp images at any time can also be obtained. The present solution is not particularly limited to this.
In the embodiment of the application, the first feature of the blurred image and the first feature of the event stream are obtained by respectively performing preliminary feature extraction on the blurred image and the event stream, and then the first feature of the blurred image and the first feature of the event stream are input into a dual feature extraction model for performing depth feature extraction to obtain the image feature after deblurring (namely, the second feature of the blurred image) and the event stream feature after event noise suppression (namely, the second feature of the event stream); and then inputting the obtained second characteristic of the blurred image and the second characteristic of the event stream into a clear image decoding network fusing continuous time information, so as to obtain a clear image at the target moment. By adopting the method, the suppression of the event characteristic noise and the deblurring of the image characteristic can be realized. By restoring the clear image at any moment in the exposure time, the whole motion scene corresponding to the blurred image can be known and analyzed more completely without missing any important moment information.
On the basis of the foregoing embodiments, the embodiments of the present application provide a model training method. The model describes the relationship between the input blurred image, the event stream and the sharp image of the scene at any time within the exposure time by learning an implicit neural representation INR, which may be referred to as an Implicit Video Function (IVF). Firstly, a clear image of a scene and depth features of a motion state in the recovered exposure time are extracted from an input observable blurred image B and an event stream epsilon through a dual feature extraction network, and then the extracted depth features and the corresponding time t (or coding information of the t) are input into a continuous time decoding network, so that the clear image of the corresponding scene at the time t is decoded. In the network training process, the clear image obtained by the network also has the time sequence information corresponding to t along with the continuous change of the continuous time t, so that the corresponding clear scene can be output according to the input any time t (within the exposure time) in the network reasoning stage.
In the embodiment, the dual feature extraction network f is jointly trained in an End-To-End mode γ And MLP continuous time decoding networkBecause the embodiment sparsely samples a limited number of high-definition images as the correct label ground channel for supervision only in the exposure time, the cost function at the sampling time and the non-sampling time have different construction modes.
Clear image I (t) generated by directly measuring l1-loss through cost function at sampling moment i ) And groundThe difference between them:
due to group tru of non-sampling timeth is unknown, so the optical Flow is first estimated from the event information using the optical Flow estimation network EV-Flow (-)And then calculate the group route at any time. The specific process is shown as the following formula:
after a clear image at a required moment is obtained, a cost function of a norm is constructed according to the same mode:
in combination with the sampling instants and the non-sampling instants, the overall cost function can be expressed as:
the embodiment is described only by taking the above implementation manner as an example, and may also be implemented in other manners, which is not specifically limited in this respect.
Referring to fig. 5, an image processing apparatus according to an embodiment of the present application is shown. As shown in fig. 5, the apparatus comprises a first extraction module 501, a second extraction module 502 and a processing module 503, wherein:
a first extraction module 501, configured to perform feature extraction on a blurred image and an event stream, respectively, to obtain a first feature of the blurred image and a first feature of the event stream, where the event stream is triggered by a scene brightness change within an exposure time corresponding to the blurred image;
a second extraction module 502, configured to perform depth feature extraction on the first feature of the blurred image and the first feature of the event stream to obtain a second feature of the blurred image and a second feature of the event stream;
a processing module 503, configured to obtain a sharp image at a target time according to the second feature of the blurred image and the second feature of the event stream, where the target time is any time within the exposure time.
Optionally, the second extracting module 502 is configured to:
respectively obtaining a depth characteristic parameter of the blurred image and a depth characteristic parameter of the event stream according to the first characteristic of the blurred image and the first characteristic of the event stream;
and carrying out interactive processing on the depth characteristic parameters of the blurred image and the depth characteristic parameters of the event stream to obtain a second characteristic of the blurred image and a second characteristic of the event stream.
Optionally, the processing module 503 is configured to:
coding the target moment to obtain a time vector;
obtaining image features of fusion time information according to the time vector, the second features of the blurred image and the second features of the event stream;
and decoding the image characteristics of the fusion time information to obtain a clear image of the target moment.
For a specific function implementation manner of the image processing apparatus, reference may be made to the description of the image processing method, and details are not repeated here. The units or modules in the device may be respectively or completely combined into one or several other units or modules to form another unit or module, or some unit(s) or module(s) thereof may be further split into multiple functionally smaller units or modules to form another unit or module, which may achieve the same operation without affecting the achievement of the technical effect of the embodiments of the present invention. The above units or modules are divided based on logic functions, and in practical applications, the functions of one unit (or module) may also be implemented by a plurality of units (or modules), or the functions of a plurality of units (or modules) may be implemented by one unit (or module).
Based on the description of the method embodiment and the device embodiment, the embodiment of the invention also provides an image processing device. Fig. 6 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present invention. The image processing apparatus 600 shown in fig. 6 includes a memory 601, a processor 602, a communication interface 603, and a bus 604. The memory 601, the processor 602, and the communication interface 603 are communicatively connected to each other via a bus 604.
The Memory 601 may be a Read Only Memory (ROM), a static Memory device, a dynamic Memory device, or a Random Access Memory (RAM).
The memory 601 may store a program, and when the program stored in the memory 601 is executed by the processor 602, the processor 602 executes the steps of the image processing method according to the embodiment of the present application through the communication interface 603.
The processor 602 may be a general-purpose Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), a Graphics Processing Unit (GPU), or one or more Integrated circuits, and is configured to execute related programs to implement the functions required to be executed by the units in the image Processing apparatus according to the embodiment of the present disclosure, or to execute the image Processing method according to the embodiment of the present disclosure.
The processor 602 may also be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the image processing method of the present application may be implemented by integrated logic circuits of hardware or instructions in the form of software in the processor 602. The processor 602 may also be a CPU, a Digital Signal Processor (DSP), an ASIC, an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 601, and the processor 602 reads information in the memory 601, and in combination with hardware thereof, performs functions required to be performed by units included in the image processing apparatus according to the embodiment of the present application, or performs the image processing method according to the embodiment of the method of the present application.
The communication interface 603 enables communication between the image processing apparatus 600 and other devices or communication networks using a transceiver device such as, but not limited to, a transceiver. For example, data may be acquired through the communication interface 603.
Bus 604 may include a pathway to transfer information between various components of image processing device 600 (e.g., memory 601, processor 602, communication interface 603).
It should be noted that although the image processing apparatus 600 shown in fig. 6 only shows a memory, a processor, and a communication interface, in a specific implementation process, a person skilled in the art should understand that the image processing apparatus 600 also includes other devices necessary for normal operation. Meanwhile, according to specific needs, it will be understood by those skilled in the art that the image processing apparatus 600 may further include hardware devices for implementing other additional functions. Furthermore, it should be understood by those skilled in the art that the image processing apparatus 600 may also include only the devices necessary to implement the embodiments of the present application, and not necessarily all of the devices shown in fig. 6.
The embodiment of the application further provides a chip, the chip comprises a processor and a data interface, and the processor reads the instruction stored in the memory through the data interface so as to realize the image processing method.
In a possible implementation, the chip may further include a memory, the memory having instructions stored therein, and the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the processor is configured to execute the image processing method.
Embodiments of the present application also provide a computer-readable storage medium having stored therein instructions, which when executed on a computer or processor, cause the computer or processor to perform one or more steps of any one of the methods described above.
The embodiment of the application also provides a computer program product containing instructions. The computer program product, when run on a computer or processor, causes the computer or processor to perform one or more steps of any of the methods described above.
Those of skill in the art will appreciate that the functions described in connection with the various illustrative logical blocks, modules, and algorithm steps described in the disclosure herein may be implemented as hardware, software, firmware, or any combination thereof. If implemented in software, the functions described in the various illustrative logical blocks, modules, and steps may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. The computer-readable medium may include a computer-readable storage medium, which corresponds to a tangible medium, such as a data storage medium, or a communication medium including any medium that facilitates transfer of a computer program from one place to another (e.g., based on a communication protocol). In this manner, a computer-readable medium may generally correspond to (1) a tangible computer-readable storage medium that is not transitory, or (2) a communication medium, such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described herein. The computer program product may include a computer-readable medium.
By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that the computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are instead directed to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media. The instructions may be executed by one or more processors, such as one or more DSPs, general purpose microprocessors, ASICs, FPGAs, or other equivalent integrated or discrete logic circuitry. Accordingly, the term "processor," as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. Additionally, in some aspects, the functions described by the various illustrative logical blocks, modules, and steps described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements.
The techniques of this application may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an Integrated Circuit (IC), or a set of ICs (e.g., a chipset). Various components, modules, or units are described in this application to emphasize functional aspects of means for performing the disclosed techniques, but do not necessarily require realization by different hardware units. Indeed, as described above, the various units may be combined in coded hardware units, in combination with suitable software and/or firmware, or provided by interoperative hardware units (including one or more processors as described above).
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the specific descriptions of the corresponding steps in the foregoing method embodiments, and are not described herein again.
It should be understood that in the description of the present application, unless otherwise indicated, "/" indicates a relationship where the objects associated before and after are an "or", e.g., a/B may indicate a or B; wherein A and B can be singular or plural. Also, in the description of the present application, "a plurality" means two or more than two unless otherwise specified. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple. In addition, in order to facilitate clear description of technical solutions of the embodiments of the present application, in the embodiments of the present application, terms such as "first" and "second" are used to distinguish the same items or similar items having substantially the same functions and actions. Those skilled in the art will appreciate that the terms "first," "second," etc. do not denote any order or quantity, nor do the terms "first," "second," etc. denote any order or importance. Also, in the embodiments of the present application, words such as "exemplary" or "for example" are used to mean serving as examples, illustrations or illustrations. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present relevant concepts in a concrete fashion for ease of understanding.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the division of the unit is only one logical function division, and other division may be implemented in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. The shown or discussed mutual coupling, direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the present application are wholly or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, DSL) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be ROM, or RAM, or a magnetic medium, such as a floppy disk, a hard disk, a magnetic tape, a magnetic disk, or an optical medium, such as a digital versatile disk DVD, or a semiconductor medium, such as a Solid State Disk (SSD), etc.
The above description is only a specific implementation of the embodiments of the present application, but the scope of the embodiments of the present application is not limited thereto, and any changes or substitutions within the technical scope disclosed in the embodiments of the present application should be covered by the scope of the embodiments of the present application. Therefore, the protection scope of the embodiments of the present application shall be subject to the protection scope of the claims.
Claims (9)
1. An image processing method, comprising:
respectively extracting features of a blurred image and an event stream to obtain a first feature of the blurred image and a first feature of the event stream, wherein the event stream is triggered by scene brightness change within exposure time corresponding to the blurred image;
performing depth feature extraction on the first features of the blurred image and the first features of the event stream to obtain second features of the blurred image and second features of the event stream;
and obtaining a clear image of a target moment according to the second characteristic of the blurred image and the second characteristic of the event stream, wherein the target moment is any moment in the exposure time.
2. The method of claim 1, wherein the depth feature extraction of the first feature of the blurred image and the first feature of the event stream to obtain the second feature of the blurred image and the second feature of the event stream comprises:
respectively obtaining a depth characteristic parameter of the blurred image and a depth characteristic parameter of the event stream according to the first characteristic of the blurred image and the first characteristic of the event stream;
and carrying out interactive processing on the depth characteristic parameters of the blurred image and the depth characteristic parameters of the event stream to obtain a second characteristic of the blurred image and a second characteristic of the event stream.
3. The method according to claim 1 or 2, wherein the obtaining a sharp image of a target time from the second feature of the blurred image and the second feature of the event stream, the target time being any time within the exposure time, comprises:
coding the target moment to obtain a time vector;
obtaining image features of fusion time information according to the time vector, the second features of the blurred image and the second features of the event stream;
and decoding the image characteristics of the fusion time information to obtain a clear image of the target moment.
4. An image processing apparatus characterized by comprising:
the first extraction module is used for respectively extracting features of a blurred image and an event stream to obtain a first feature of the blurred image and a first feature of the event stream, wherein the event stream is triggered by scene brightness change within exposure time corresponding to the blurred image;
the second extraction module is used for performing depth feature extraction on the first features of the blurred image and the first features of the event stream to obtain second features of the blurred image and second features of the event stream;
and the processing module is used for obtaining a clear image of a target moment according to the second characteristic of the blurred image and the second characteristic of the event stream, wherein the target moment is any moment in the exposure time.
5. The apparatus of claim 4, wherein the second extraction module is configured to:
respectively obtaining a depth characteristic parameter of the blurred image and a depth characteristic parameter of the event stream according to the first characteristic of the blurred image and the first characteristic of the event stream;
and carrying out interactive processing on the depth characteristic parameters of the blurred image and the depth characteristic parameters of the event stream to obtain a second characteristic of the blurred image and a second characteristic of the event stream.
6. The apparatus of claim 4 or 5, wherein the processing module is configured to:
coding the target moment to obtain a time vector;
obtaining image features of fusion time information according to the time vector, the second features of the blurred image and the second features of the event stream;
and decoding the image characteristics of the fusion time information to obtain a clear image of the target moment.
7. An image processing apparatus comprising a processor and a memory; wherein the memory is configured to store program code and the processor is configured to invoke the program code to perform the method of any of claims 1 to 3.
8. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which is executed by a processor to implement the method according to any one of claims 1 to 3.
9. A computer program product, characterized in that, when the computer program product is run on a computer, it causes the computer to perform the method according to any one of claims 1 to 3.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210334494.1A CN114841870A (en) | 2022-03-31 | 2022-03-31 | Image processing method, related device and system |
PCT/CN2023/083859 WO2023185693A1 (en) | 2022-03-31 | 2023-03-24 | Image processing method, and related apparatus and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210334494.1A CN114841870A (en) | 2022-03-31 | 2022-03-31 | Image processing method, related device and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114841870A true CN114841870A (en) | 2022-08-02 |
Family
ID=82563863
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210334494.1A Pending CN114841870A (en) | 2022-03-31 | 2022-03-31 | Image processing method, related device and system |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114841870A (en) |
WO (1) | WO2023185693A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023185693A1 (en) * | 2022-03-31 | 2023-10-05 | 华为技术有限公司 | Image processing method, and related apparatus and system |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8390704B2 (en) * | 2009-10-16 | 2013-03-05 | Eastman Kodak Company | Image deblurring using a spatial image prior |
CN110060215B (en) * | 2019-04-16 | 2021-09-10 | 深圳市商汤科技有限公司 | Image processing method and device, electronic equipment and storage medium |
CN111445414B (en) * | 2020-03-27 | 2023-04-14 | 北京市商汤科技开发有限公司 | Image processing method and device, electronic equipment and storage medium |
CN112767277B (en) * | 2021-01-27 | 2022-06-07 | 同济大学 | Depth feature sequencing deblurring method based on reference image |
CN114841870A (en) * | 2022-03-31 | 2022-08-02 | 华为技术有限公司 | Image processing method, related device and system |
-
2022
- 2022-03-31 CN CN202210334494.1A patent/CN114841870A/en active Pending
-
2023
- 2023-03-24 WO PCT/CN2023/083859 patent/WO2023185693A1/en unknown
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023185693A1 (en) * | 2022-03-31 | 2023-10-05 | 华为技术有限公司 | Image processing method, and related apparatus and system |
Also Published As
Publication number | Publication date |
---|---|
WO2023185693A1 (en) | 2023-10-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20200117906A1 (en) | Space-time memory network for locating target object in video content | |
CN112001914A (en) | Depth image completion method and device | |
CN112507990A (en) | Video time-space feature learning and extracting method, device, equipment and storage medium | |
CN113592913B (en) | Method for eliminating uncertainty of self-supervision three-dimensional reconstruction | |
CN109300151B (en) | Image processing method and device and electronic equipment | |
CN111507262B (en) | Method and apparatus for detecting living body | |
CN112801047B (en) | Defect detection method and device, electronic equipment and readable storage medium | |
GB2579262A (en) | Space-time memory network for locating target object in video content | |
CN113066034A (en) | Face image restoration method and device, restoration model, medium and equipment | |
CN111626956A (en) | Image deblurring method and device | |
CN113379601A (en) | Real world image super-resolution method and system based on degradation variational self-encoder | |
CN113269722A (en) | Training method for generating countermeasure network and high-resolution image reconstruction method | |
CN116205962B (en) | Monocular depth estimation method and system based on complete context information | |
WO2023185693A1 (en) | Image processing method, and related apparatus and system | |
CN113379606B (en) | Face super-resolution method based on pre-training generation model | |
CN113689372A (en) | Image processing method, apparatus, storage medium, and program product | |
CN111382647A (en) | Picture processing method, device, equipment and storage medium | |
CN109816791B (en) | Method and apparatus for generating information | |
CN117036442A (en) | Robust monocular depth completion method, system and storage medium | |
TWI803243B (en) | Method for expanding images, computer device and storage medium | |
Yang et al. | Deep Convolutional Grid Warping Network for Joint Depth Map Upsampling | |
CN112995433B (en) | Time sequence video generation method and device, computing equipment and storage medium | |
CN114842066A (en) | Image depth recognition model training method, image depth recognition method and device | |
CN115049901A (en) | Small target detection method and device based on feature map weighted attention fusion | |
CN114972016A (en) | Image processing method, image processing apparatus, computer device, storage medium, and program product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |