CN114841870A - Image processing method, related device and system - Google Patents

Image processing method, related device and system Download PDF

Info

Publication number
CN114841870A
CN114841870A CN202210334494.1A CN202210334494A CN114841870A CN 114841870 A CN114841870 A CN 114841870A CN 202210334494 A CN202210334494 A CN 202210334494A CN 114841870 A CN114841870 A CN 114841870A
Authority
CN
China
Prior art keywords
event stream
image
blurred image
feature
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210334494.1A
Other languages
Chinese (zh)
Inventor
余磊
程章意
刘健庄
王应龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202210334494.1A priority Critical patent/CN114841870A/en
Publication of CN114841870A publication Critical patent/CN114841870A/en
Priority to PCT/CN2023/083859 priority patent/WO2023185693A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • G06T5/70
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Abstract

The embodiment of the application provides an image processing method, a related device and a system. The method comprises the following steps: respectively extracting features of a blurred image and an event stream to obtain a first feature of the blurred image and a first feature of the event stream, wherein the event stream is triggered by scene brightness change within exposure time corresponding to the blurred image; performing depth feature extraction on the first features of the blurred image and the first features of the event stream to obtain second features of the blurred image and second features of the event stream; and obtaining a clear image of a target moment according to the second characteristic of the blurred image and the second characteristic of the event stream, wherein the target moment is any moment in the exposure time. By adopting the method, the suppression of event characteristic noise and the deblurring of image characteristics can be realized, and a clear image at any moment in the exposure time can be recovered.

Description

Image processing method, related device and system
Technical Field
The present application relates to the field of image processing technologies, and in particular, to an image processing method, a related apparatus and a system.
Background
Motion deblurring is a traditional and important problem in the field of computer vision and photography. Motion blur is the aliasing of different spatial position information due to camera motion or scene changes during imaging, resulting in the loss of important high frequency information in the image. But motion blurred images typically contain complete information of the moving scene during the exposure time. Therefore, it is fully possible to recover the motion state of the object and the sharp object texture from the blurred image. On one hand, motion deblurring enables a restored image/video to be presented with a good visual effect, enables people to know useful information hidden in a blurred image, improves visual experience of a terminal user, and on the other hand, is beneficial to solving the problem that low-quality images are difficult to analyze, and has positive promoting effects on analyzing people and improving performances of other computer vision algorithms such as detection, tracking and the like.
In the prior art, a CNN network is designed, a single motion blur image and event data are used as input, and a plurality of clear video sequences are learned through supervision of synthesized data. In order to guide a network to perform semi-supervised training to improve the generalization capability of a model on real data, another CNN branch is designed, an event stream is used as input, and optical flow information is output as motion information of a scene. And a new blurred image is obtained by re-rendering based on the clear image and the motion information obtained by estimation through a physical blur forming process, and the newly obtained blurred image and the blurred image used as input establish a blur consistency cost function to guide the training of the CNN network in a self-supervision mode. In addition, a new clear image is formed by the estimated clear image through the transmission of motion information, and the new clear image and the clear image obtained through network estimation establish an automatic supervision cost function of photometric consistency to further guide the training of the network. In the process of establishing the two cost functions, the segmented linear motion model is used for estimating the high nonlinearity of the motion, so that the network outputs an accurate high-density motion flow.
However, it can only recover a clear image corresponding to a specific time point, which may lose important scene information at a certain time, thereby affecting the analysis and judgment of the whole scene. Moreover, the time-space noise of the event cannot be well suppressed, and the event noise can falsely guide the training of the network, so that the recovered image has noise points in places which are not blurred originally; in addition, noise can affect the estimation of the brightness change of the motion scene, so that the estimation of the motion by the network can be affected by wrong time stamps and the number of event points.
Disclosure of Invention
The application discloses an image processing method, a related device and a system, which can recover a clear image at any moment in exposure time and can inhibit the time-space noise of an event.
In a first aspect, an embodiment of the present application provides an image processing method, including: respectively extracting features of a blurred image and an event stream to obtain a first feature of the blurred image and a first feature of the event stream, wherein the event stream is triggered by scene brightness change within exposure time corresponding to the blurred image; performing depth feature extraction on the first features of the blurred image and the first features of the event stream to obtain second features of the blurred image and second features of the event stream; and obtaining a clear image of a target moment according to the second characteristic of the blurred image and the second characteristic of the event stream, wherein the target moment is any moment in the exposure time.
In the embodiment of the application, the first feature of the blurred image and the first feature of the event stream are obtained by respectively performing primary feature extraction on the blurred image and the event stream, and then the first feature of the blurred image and the first feature of the event stream are subjected to depth feature extraction to obtain the image feature after deblurring (namely, the second feature of the blurred image) and the event stream feature after event noise suppression (namely, the second feature of the event stream); and then obtaining a clear image at the target moment according to the second characteristic of the blurred image and the second characteristic of the event stream. By adopting the method, the suppression of the event characteristic noise and the deblurring of the image characteristic can be realized. By restoring the clear image at any moment in the exposure time, the whole motion scene corresponding to the blurred image can be known and analyzed more completely without missing any important moment information.
In one possible implementation manner, the performing depth feature extraction on the first feature of the blurred image and the first feature of the event stream to obtain the second feature of the blurred image and the second feature of the event stream includes: respectively obtaining a depth characteristic parameter of the blurred image and a depth characteristic parameter of the event stream according to the first characteristic of the blurred image and the first characteristic of the event stream; and carrying out interactive processing on the depth characteristic parameters of the blurred image and the depth characteristic parameters of the event stream to obtain a second characteristic of the blurred image and a second characteristic of the event stream.
According to the method and the device, the problem of image deblurring at any moment in the exposure time is solved by utilizing the ultrahigh time resolution of the event information in a complementary mode, and meanwhile, the noise in the event is restrained by utilizing the information smoothness of the blurred image, so that the scene dynamic in the exposure time is more accurately estimated.
In a possible implementation manner, the obtaining a sharp image at a target time according to the second feature of the blurred image and the second feature of the event stream, where the target time is any time within the exposure time includes: coding the target moment to obtain a time vector; obtaining image features of fusion time information according to the time vector, the second features of the blurred image and the second features of the event stream; and decoding the image characteristics of the fusion time information to obtain a clear image of the target moment.
The embodiment realizes the fusion of any continuous time information and depth characteristics by encoding continuous time signals; by adopting the MLP network structure of continuous time decoding to decode the clear image corresponding to any time in the exposure time from the depth feature of the fusion time information, the whole motion scene corresponding to the blurred image can be more completely understood and analyzed without missing any important time information.
In a second aspect, an embodiment of the present application provides an image processing apparatus, including: the first extraction module is used for respectively extracting features of a blurred image and an event stream to obtain a first feature of the blurred image and a first feature of the event stream, wherein the event stream is triggered by scene brightness change within exposure time corresponding to the blurred image; the second extraction module is used for performing depth feature extraction on the first features of the blurred image and the first features of the event stream to obtain second features of the blurred image and second features of the event stream; and the processing module is used for obtaining a clear image of a target moment according to the second characteristic of the blurred image and the second characteristic of the event stream, wherein the target moment is any moment in the exposure time.
In a possible implementation manner, the second extraction module is configured to: respectively obtaining a depth characteristic parameter of the blurred image and a depth characteristic parameter of the event stream according to the first characteristic of the blurred image and the first characteristic of the event stream; and carrying out interactive processing on the depth characteristic parameters of the blurred image and the depth characteristic parameters of the event stream to obtain a second characteristic of the blurred image and a second characteristic of the event stream.
In one possible implementation manner, the processing module is configured to: coding the target moment to obtain a time vector; obtaining image features of fusion time information according to the time vector, the second features of the blurred image and the second features of the event stream; and decoding the image characteristics of the fusion time information to obtain a clear image at the target moment.
In a third aspect, the present application provides an image processing apparatus comprising a processor and a memory; wherein the memory is configured to store program code, and the processor is configured to call the program code to perform the method as provided in any one of the possible embodiments of the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium comprising computer instructions that, when executed on an electronic device, cause the electronic device to perform the method as provided in any one of the possible embodiments of the first aspect.
In a fifth aspect, the embodiments of the present application provide a computer program product, which when run on a computer, causes the computer to execute the method as provided in any one of the possible embodiments of the first aspect.
It will be appreciated that the apparatus of the second aspect, the apparatus of the third aspect, the computer readable storage medium of the fourth aspect, or the computer program product of the fifth aspect provided above are all adapted to perform the method provided in any of the first aspects. Therefore, the beneficial effects achieved by the method can refer to the beneficial effects in the corresponding method, and are not described herein again.
Drawings
The drawings used in the embodiments of the present application are described below.
Fig. 1 is a schematic architecture diagram of an image processing system according to an embodiment of the present application;
fig. 2 is a schematic flowchart of an image processing method according to an embodiment of the present application;
FIG. 3a is a schematic flowchart of another image processing method provided in an embodiment of the present application;
FIG. 3b is a schematic diagram of a model process provided by an embodiment of the present application;
fig. 3c is a schematic diagram of a network structure of a dual event with an implicit structure according to an embodiment of the present application;
FIG. 4a is a schematic diagram of another image processing method provided in the embodiment of the present application;
fig. 4b is a schematic diagram of a network structure provided in the embodiment of the present application;
fig. 4c is a schematic diagram of another network structure provided in the embodiment of the present application;
fig. 5 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of another image processing apparatus according to an embodiment of the present application.
Detailed Description
The embodiments of the present application will be described below with reference to the drawings. The terminology used in the description of the embodiments herein is for the purpose of describing particular embodiments herein only and is not intended to be limiting of the application.
Fig. 1 is a schematic diagram illustrating an architecture of an image processing system according to an embodiment of the present disclosure. The system may include an image processing device and an autonomous vehicle/robot. The image processing device is used for processing the blurred image and an event stream triggered by scene brightness change in the exposure time corresponding to the blurred image to obtain a clear image corresponding to any moment in the exposure time. Clear video sequences can be obtained based on a plurality of corresponding clear images at any time, and further the clear video sequences can be processed in an automatic driving vehicle/robot so as to achieve the purposes of face recognition, semantic segmentation, object detection or 3D reconstruction and the like.
The image processing apparatus may be a server, an arbitrary terminal device, or the like, and may be located inside the autonomous vehicle or the robot, or connected to the outside of the autonomous vehicle or the robot, for example, to perform image processing.
The embodiment is described only by taking an automatically driven vehicle or a robot as an example, and the embodiment may also be other devices and the like, which is not particularly limited in this embodiment.
Fig. 2 is a schematic flow chart of an image processing method according to an embodiment of the present application. As shown in fig. 2, the method includes steps 201 and 203, which are as follows:
201. respectively extracting features of a blurred image and an event stream to obtain a first feature of the blurred image and a first feature of the event stream, wherein the event stream is triggered by scene brightness change within exposure time corresponding to the blurred image;
the blurred image can be understood as an image that is captured as a result of motion. For example, when an object is moving and the object is photographed at this time, a blurred image is acquired.
The feature extraction may be a preliminary process performed on the blurred image and the event stream, respectively, and may be, for example, a map feature.
In a possible implementation manner, the feature extraction is performed on the blurred image and the event stream respectively, and the blurred image and the event stream may be input into a preset neural network model for processing, so as to obtain a first feature of the blurred image and a first feature of the event stream. Other means can be adopted, and the scheme is not particularly limited in this respect.
202. Performing depth feature extraction on the first features of the blurred image and the first features of the event stream to obtain second features of the blurred image and second features of the event stream;
the second feature of the blurred image may be understood as a deblurred image feature obtained by processing the first feature of the blurred image.
The second feature of the event stream may be understood as an event stream feature obtained by processing the first feature of the event stream and having event noise suppressed.
In a possible implementation manner, the depth feature extraction may be performed on the first feature of the blurred image and the first feature of the event stream by inputting both the first feature of the blurred image and the first feature of the event stream into a preset neural network model for processing, so as to obtain the second feature of the blurred image and the second feature of the event stream. Other means can be adopted, and the scheme is not particularly limited in this respect.
203. And obtaining a clear image of a target moment according to the second characteristic of the blurred image and the second characteristic of the event stream, wherein the target moment is any moment in the exposure time.
In one possible implementation, based on the deblurred image feature (i.e., the second feature of the blurred image) and the event stream feature (i.e., the second feature of the event stream) obtained as described above, time information is added to the second feature of the blurred image and the second feature of the event stream, so that a sharp image at a target time is obtained from the feature having the time information.
In the embodiment of the application, the first feature of the blurred image and the first feature of the event stream are obtained by respectively performing primary feature extraction on the blurred image and the event stream, and then the first feature of the blurred image and the first feature of the event stream are subjected to depth feature extraction to obtain the image feature after deblurring (namely, the second feature of the blurred image) and the event stream feature after event noise suppression (namely, the second feature of the event stream); and then obtaining a clear image at the target moment according to the second characteristic of the blurred image and the second characteristic of the event stream. By adopting the method, the suppression of the event characteristic noise and the deblurring of the image characteristic can be realized. By restoring the clear image at any moment in the exposure time, the whole motion scene corresponding to the blurred image can be known and analyzed more completely without missing any important moment information. The scheme can recover more image details and has higher definition. The scheme can greatly promote the performance of other high-level computer vision algorithms.
Fig. 3a is a schematic flow chart of another image processing method according to the embodiment of the present application. As shown in fig. 3a, the method comprises steps 301 and 304, which are as follows:
301. respectively extracting features of a blurred image and an event stream to obtain a first feature of the blurred image and a first feature of the event stream, wherein the event stream is triggered by scene brightness change within exposure time corresponding to the blurred image;
the blurred image can be understood as an image that is captured as a result of motion. For example, when an object is moving and the object is photographed at this time, a blurred image is acquired.
The feature extraction may be a preliminary process performed on the blurred image and the event stream, respectively, and may be, for example, a map feature.
In a possible implementation manner, the feature extraction is performed on the blurred image and the event stream respectively, and the blurred image and the event stream may be input into a preset neural network model for processing, so as to obtain a first feature of the blurred image and a first feature of the event stream.
Specifically, as shown in fig. 3b, a processing diagram of a shallow feature extraction network model provided in the embodiment of the present application is shown. As shown in FIG. 3B, the shallow feature extraction network model SFE comprises a pixel scrambling reshf layer and two convolution layers, wherein the shallow feature B is obtained by inputting the observable blurred image B and the high-time-resolution event stream epsilon into the shallow feature extraction network model respectively, and the two signals pass through the two convolution layers respectively feat (i.e., first feature of blurred image) and E feat (i.e., the first feature of the event stream).
The shallow feature extraction network model can enable the network training of the transformer to be more stable when the second features of the fuzzy image and the second features of the event stream are obtained subsequently; meanwhile, the input with different scales can generate the features with the same scale, and the feature embedding of the subsequent transform is facilitated.
Other means can be adopted, and the scheme is not particularly limited in this respect.
302. Respectively obtaining a depth characteristic parameter of the blurred image and a depth characteristic parameter of the event stream according to the first characteristic of the blurred image and the first characteristic of the event stream;
in a possible implementation manner, as shown in fig. 3c, the embodiment of the present application further provides a network structure of dual attentions with an implicit structure. As shown in fig. 3c, this network structure enables two input signals to interact, thereby achieving suppression of event noise and deblurring of image features. The network structure is formed by stacking a plurality of DALS blocks, and each DALS block takes two kinds of characteristics extracted by a shallow layer characteristic extraction network or the output of the previous DALS as input.
Specifically, the network structure first aligns the first feature B of the blurred image feat And a first characteristic E of said event stream feat Respectively carrying out feature embedding through Residual Dense Blocks (RDB), and then coding out query (Q), Key (K) and value (V) required for calculating respective attribution through a linear layer. The attion of the two signals was calculated as follows:
Figure BDA0003576157810000051
wherein d is k Is the dimension of a single feature in Q, K, V, and W-MSA (Q, K, V) is the feature representation in the middle of the DALS block.
Based on the processing, the depth characteristic parameter of the blurred image and the depth characteristic parameter of the event stream can be obtained. This embodiment will be described by taking a depth feature parameter as an attention parameter.
It may also be other parameters obtained through processing by other network models, and this is not specifically limited in this embodiment.
303. Carrying out interactive processing on the depth characteristic parameters of the blurred image and the depth characteristic parameters of the event stream to obtain second characteristics of the blurred image and second characteristics of the event stream;
after obtaining the attention of each signal, the two signals are processed interactively in the following two ways.
1) As shown in fig. 3c, the event atttion suppresses noise by correction of the atttion of the blurred image, and then outputs a depth feature by MLP:
Attn E ←Attn E +Attn B
Figure BDA0003576157810000052
Figure BDA0003576157810000053
wherein Attn E And Attn B Attention, V-of the event signal and blurred image, respectively, is the value calculation operator, where E feat Refers to the depth feature (i.e., the second feature) of the event signal.
2) Depth feature B of image for removing blurring of image feature feat (i.e., the second feature) is calculated from the features of two signals:
Figure BDA0003576157810000061
Figure BDA0003576157810000062
the scheme solves the problem of image deblurring at any moment in the exposure time by utilizing the ultrahigh time resolution of the event information in a complementary mode, and simultaneously inhibits noise in the event by utilizing the information smoothness of the blurred image, thereby realizing more accurate estimation of scene dynamics in the exposure time.
Based on the above processing, the deblurred image feature (i.e., the second feature of the blurred image) and the event stream feature (i.e., the second feature of the event stream) with the event noise suppressed can be obtained.
304. And obtaining a clear image of a target moment according to the second characteristic of the blurred image and the second characteristic of the event stream, wherein the target moment is any moment in the exposure time.
In one possible implementation, based on the deblurred image feature (i.e., the second feature of the blurred image) and the event stream feature (i.e., the second feature of the event stream) obtained as described above, time information is added to the second feature of the blurred image and the second feature of the event stream, so that a sharp image at a target time is obtained from the feature having the time information.
In the embodiment of the application, the first feature of the blurred image and the first feature of the event stream are obtained by respectively performing preliminary feature extraction on the blurred image and the event stream, and then the first feature of the blurred image and the first feature of the event stream are input into a dual feature extraction model for performing depth feature extraction to obtain the image feature after deblurring (namely, the second feature of the blurred image) and the event stream feature after event noise suppression (namely, the second feature of the event stream); and then obtaining a clear image at the target moment according to the second characteristic of the blurred image and the second characteristic of the event stream. By adopting the method, the suppression of the event characteristic noise and the deblurring of the image characteristic can be realized. By recovering the clear image at any moment in the exposure time, the whole motion scene corresponding to the blurred image can be known and analyzed more completely without missing any important moment information.
Fig. 4a is a schematic diagram illustrating another image processing method according to an embodiment of the present application. The method comprises steps 401-406, which are as follows:
401. respectively extracting features of a blurred image and an event stream to obtain a first feature of the blurred image and a first feature of the event stream, wherein the event stream is triggered by scene brightness change within exposure time corresponding to the blurred image;
for the description of this part, refer to fig. 4b and the description of the foregoing embodiments, which are not repeated herein.
402. Respectively obtaining a depth characteristic parameter of the blurred image and a depth characteristic parameter of the event stream according to the first characteristic of the blurred image and the first characteristic of the event stream;
dual feature extraction network model f as in FIG. 4a γ Extracting the network model f by dual features γ And extracting depth features for recovering a clear image and a motion state of a scene in the exposure time from the input observable blurred image B and the event stream epsilon.
For the description of this part, reference may be made to fig. 4b and the description of the foregoing embodiment, which are not repeated herein.
403. Carrying out interactive processing on the depth characteristic parameters of the blurred image and the depth characteristic parameters of the event stream to obtain second characteristics of the blurred image and second characteristics of the event stream;
for the description of this part, reference may be made to the foregoing embodiments, which are not described in detail herein.
Based on the above processing, the deblurred image feature (i.e., the second feature of the blurred image) and the event stream feature (i.e., the second feature of the event stream) with the event noise suppressed can be obtained.
404. Coding the target moment to obtain a time vector;
as shown in figure 4a of the drawings,
Figure BDA0003576157810000072
the method is a clear image decoding network fusing continuous time information, and the clear image of a scene corresponding to the moment t is decoded by inputting the extracted depth features and the corresponding time t (or the coding information of the t) into the clear image decoding network fusing the continuous time information, wherein the t is the time information of any moment in an exposure event. In the network training process, the clear images obtained by the network also have time sequence information corresponding to t along with the continuous change of the continuous time t, so that the images of the corresponding clear scenes can be output according to the input any time t (within the exposure time) in the network reasoning stage.
The processing method corresponding to fig. 4a is as follows:
I(t)=B+φ θ (t;f γ (B,ε τ ))
Figure BDA0003576157810000071
where x, p represent the location and polarity of the event trigger, respectively.
In one possible implementation, as shown in fig. 4c, a network structure for decoding a sharp image that merges continuous time information is provided in the embodiment of the present application. As shown in fig. 4c, after the depth features extracted by DALS are processed by a plurality of convolutional layers, the encoded time signal is merged into the depth features output by the convolutional layers, and then a clear image at time t is decoded by one MLP.
In one possible implementation, the exposure time is first normalized. For example, if the exposure time is 0.1s, the normalization process is performed so that the normalized time becomes 1. Based on this operation, the time after the arbitrary time normalization processing within the exposure time can be obtained.
Then, the target time t can be encoded into a time vector by using a Fourier encoding method. The time vector may be a multidimensional time vector, for example, a 2L-dimensional vector, where L may be any positive integer, and this scheme is not particularly limited in this respect.
For example, the multi-dimensional time vector may be represented as:
n(t)=(cos(2 0 πt),sin(2 0 πt),…,cos(2 L-1 πt),sin(2 L-1 πt))。
405. obtaining image features of fusion time information according to the time vector, the second features of the blurred image and the second features of the event stream;
as shown in fig. 4c, the encoded time vector, the obtained deblurred image feature (i.e., the second feature of the blurred image) and the event stream feature (i.e., the second feature of the event stream) after the event noise suppression are concatenated, so as to implement the pixel-by-pixel concatenation of the time information into the above features.
406. And decoding the image characteristics of the fusion time information to obtain a clear image of the target moment.
And decoding the clear image at the moment t by using a pixel-by-pixel MLP network structure to obtain the clear image at the moment.
In one possible implementation, a plurality of sharp images at any time can also be obtained. The present solution is not particularly limited to this.
In the embodiment of the application, the first feature of the blurred image and the first feature of the event stream are obtained by respectively performing preliminary feature extraction on the blurred image and the event stream, and then the first feature of the blurred image and the first feature of the event stream are input into a dual feature extraction model for performing depth feature extraction to obtain the image feature after deblurring (namely, the second feature of the blurred image) and the event stream feature after event noise suppression (namely, the second feature of the event stream); and then inputting the obtained second characteristic of the blurred image and the second characteristic of the event stream into a clear image decoding network fusing continuous time information, so as to obtain a clear image at the target moment. By adopting the method, the suppression of the event characteristic noise and the deblurring of the image characteristic can be realized. By restoring the clear image at any moment in the exposure time, the whole motion scene corresponding to the blurred image can be known and analyzed more completely without missing any important moment information.
On the basis of the foregoing embodiments, the embodiments of the present application provide a model training method. The model describes the relationship between the input blurred image, the event stream and the sharp image of the scene at any time within the exposure time by learning an implicit neural representation INR, which may be referred to as an Implicit Video Function (IVF). Firstly, a clear image of a scene and depth features of a motion state in the recovered exposure time are extracted from an input observable blurred image B and an event stream epsilon through a dual feature extraction network, and then the extracted depth features and the corresponding time t (or coding information of the t) are input into a continuous time decoding network, so that the clear image of the corresponding scene at the time t is decoded. In the network training process, the clear image obtained by the network also has the time sequence information corresponding to t along with the continuous change of the continuous time t, so that the corresponding clear scene can be output according to the input any time t (within the exposure time) in the network reasoning stage.
In the embodiment, the dual feature extraction network f is jointly trained in an End-To-End mode γ And MLP continuous time decoding network
Figure BDA0003576157810000081
Because the embodiment sparsely samples a limited number of high-definition images as the correct label ground channel for supervision only in the exposure time, the cost function at the sampling time and the non-sampling time have different construction modes.
Clear image I (t) generated by directly measuring l1-loss through cost function at sampling moment i ) And ground
Figure BDA0003576157810000082
The difference between them:
Figure BDA0003576157810000083
due to group tru of non-sampling timeth is unknown, so the optical Flow is first estimated from the event information using the optical Flow estimation network EV-Flow (-)
Figure BDA0003576157810000084
And then calculate the group route at any time. The specific process is shown as the following formula:
Figure BDA0003576157810000085
after a clear image at a required moment is obtained, a cost function of a norm is constructed according to the same mode:
Figure BDA0003576157810000086
in combination with the sampling instants and the non-sampling instants, the overall cost function can be expressed as:
Figure BDA0003576157810000087
the embodiment is described only by taking the above implementation manner as an example, and may also be implemented in other manners, which is not specifically limited in this respect.
Referring to fig. 5, an image processing apparatus according to an embodiment of the present application is shown. As shown in fig. 5, the apparatus comprises a first extraction module 501, a second extraction module 502 and a processing module 503, wherein:
a first extraction module 501, configured to perform feature extraction on a blurred image and an event stream, respectively, to obtain a first feature of the blurred image and a first feature of the event stream, where the event stream is triggered by a scene brightness change within an exposure time corresponding to the blurred image;
a second extraction module 502, configured to perform depth feature extraction on the first feature of the blurred image and the first feature of the event stream to obtain a second feature of the blurred image and a second feature of the event stream;
a processing module 503, configured to obtain a sharp image at a target time according to the second feature of the blurred image and the second feature of the event stream, where the target time is any time within the exposure time.
Optionally, the second extracting module 502 is configured to:
respectively obtaining a depth characteristic parameter of the blurred image and a depth characteristic parameter of the event stream according to the first characteristic of the blurred image and the first characteristic of the event stream;
and carrying out interactive processing on the depth characteristic parameters of the blurred image and the depth characteristic parameters of the event stream to obtain a second characteristic of the blurred image and a second characteristic of the event stream.
Optionally, the processing module 503 is configured to:
coding the target moment to obtain a time vector;
obtaining image features of fusion time information according to the time vector, the second features of the blurred image and the second features of the event stream;
and decoding the image characteristics of the fusion time information to obtain a clear image of the target moment.
For a specific function implementation manner of the image processing apparatus, reference may be made to the description of the image processing method, and details are not repeated here. The units or modules in the device may be respectively or completely combined into one or several other units or modules to form another unit or module, or some unit(s) or module(s) thereof may be further split into multiple functionally smaller units or modules to form another unit or module, which may achieve the same operation without affecting the achievement of the technical effect of the embodiments of the present invention. The above units or modules are divided based on logic functions, and in practical applications, the functions of one unit (or module) may also be implemented by a plurality of units (or modules), or the functions of a plurality of units (or modules) may be implemented by one unit (or module).
Based on the description of the method embodiment and the device embodiment, the embodiment of the invention also provides an image processing device. Fig. 6 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present invention. The image processing apparatus 600 shown in fig. 6 includes a memory 601, a processor 602, a communication interface 603, and a bus 604. The memory 601, the processor 602, and the communication interface 603 are communicatively connected to each other via a bus 604.
The Memory 601 may be a Read Only Memory (ROM), a static Memory device, a dynamic Memory device, or a Random Access Memory (RAM).
The memory 601 may store a program, and when the program stored in the memory 601 is executed by the processor 602, the processor 602 executes the steps of the image processing method according to the embodiment of the present application through the communication interface 603.
The processor 602 may be a general-purpose Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), a Graphics Processing Unit (GPU), or one or more Integrated circuits, and is configured to execute related programs to implement the functions required to be executed by the units in the image Processing apparatus according to the embodiment of the present disclosure, or to execute the image Processing method according to the embodiment of the present disclosure.
The processor 602 may also be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the image processing method of the present application may be implemented by integrated logic circuits of hardware or instructions in the form of software in the processor 602. The processor 602 may also be a CPU, a Digital Signal Processor (DSP), an ASIC, an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 601, and the processor 602 reads information in the memory 601, and in combination with hardware thereof, performs functions required to be performed by units included in the image processing apparatus according to the embodiment of the present application, or performs the image processing method according to the embodiment of the method of the present application.
The communication interface 603 enables communication between the image processing apparatus 600 and other devices or communication networks using a transceiver device such as, but not limited to, a transceiver. For example, data may be acquired through the communication interface 603.
Bus 604 may include a pathway to transfer information between various components of image processing device 600 (e.g., memory 601, processor 602, communication interface 603).
It should be noted that although the image processing apparatus 600 shown in fig. 6 only shows a memory, a processor, and a communication interface, in a specific implementation process, a person skilled in the art should understand that the image processing apparatus 600 also includes other devices necessary for normal operation. Meanwhile, according to specific needs, it will be understood by those skilled in the art that the image processing apparatus 600 may further include hardware devices for implementing other additional functions. Furthermore, it should be understood by those skilled in the art that the image processing apparatus 600 may also include only the devices necessary to implement the embodiments of the present application, and not necessarily all of the devices shown in fig. 6.
The embodiment of the application further provides a chip, the chip comprises a processor and a data interface, and the processor reads the instruction stored in the memory through the data interface so as to realize the image processing method.
In a possible implementation, the chip may further include a memory, the memory having instructions stored therein, and the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the processor is configured to execute the image processing method.
Embodiments of the present application also provide a computer-readable storage medium having stored therein instructions, which when executed on a computer or processor, cause the computer or processor to perform one or more steps of any one of the methods described above.
The embodiment of the application also provides a computer program product containing instructions. The computer program product, when run on a computer or processor, causes the computer or processor to perform one or more steps of any of the methods described above.
Those of skill in the art will appreciate that the functions described in connection with the various illustrative logical blocks, modules, and algorithm steps described in the disclosure herein may be implemented as hardware, software, firmware, or any combination thereof. If implemented in software, the functions described in the various illustrative logical blocks, modules, and steps may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. The computer-readable medium may include a computer-readable storage medium, which corresponds to a tangible medium, such as a data storage medium, or a communication medium including any medium that facilitates transfer of a computer program from one place to another (e.g., based on a communication protocol). In this manner, a computer-readable medium may generally correspond to (1) a tangible computer-readable storage medium that is not transitory, or (2) a communication medium, such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described herein. The computer program product may include a computer-readable medium.
By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that the computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are instead directed to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media. The instructions may be executed by one or more processors, such as one or more DSPs, general purpose microprocessors, ASICs, FPGAs, or other equivalent integrated or discrete logic circuitry. Accordingly, the term "processor," as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. Additionally, in some aspects, the functions described by the various illustrative logical blocks, modules, and steps described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements.
The techniques of this application may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an Integrated Circuit (IC), or a set of ICs (e.g., a chipset). Various components, modules, or units are described in this application to emphasize functional aspects of means for performing the disclosed techniques, but do not necessarily require realization by different hardware units. Indeed, as described above, the various units may be combined in coded hardware units, in combination with suitable software and/or firmware, or provided by interoperative hardware units (including one or more processors as described above).
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the specific descriptions of the corresponding steps in the foregoing method embodiments, and are not described herein again.
It should be understood that in the description of the present application, unless otherwise indicated, "/" indicates a relationship where the objects associated before and after are an "or", e.g., a/B may indicate a or B; wherein A and B can be singular or plural. Also, in the description of the present application, "a plurality" means two or more than two unless otherwise specified. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple. In addition, in order to facilitate clear description of technical solutions of the embodiments of the present application, in the embodiments of the present application, terms such as "first" and "second" are used to distinguish the same items or similar items having substantially the same functions and actions. Those skilled in the art will appreciate that the terms "first," "second," etc. do not denote any order or quantity, nor do the terms "first," "second," etc. denote any order or importance. Also, in the embodiments of the present application, words such as "exemplary" or "for example" are used to mean serving as examples, illustrations or illustrations. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present relevant concepts in a concrete fashion for ease of understanding.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the division of the unit is only one logical function division, and other division may be implemented in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. The shown or discussed mutual coupling, direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the present application are wholly or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, DSL) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be ROM, or RAM, or a magnetic medium, such as a floppy disk, a hard disk, a magnetic tape, a magnetic disk, or an optical medium, such as a digital versatile disk DVD, or a semiconductor medium, such as a Solid State Disk (SSD), etc.
The above description is only a specific implementation of the embodiments of the present application, but the scope of the embodiments of the present application is not limited thereto, and any changes or substitutions within the technical scope disclosed in the embodiments of the present application should be covered by the scope of the embodiments of the present application. Therefore, the protection scope of the embodiments of the present application shall be subject to the protection scope of the claims.

Claims (9)

1. An image processing method, comprising:
respectively extracting features of a blurred image and an event stream to obtain a first feature of the blurred image and a first feature of the event stream, wherein the event stream is triggered by scene brightness change within exposure time corresponding to the blurred image;
performing depth feature extraction on the first features of the blurred image and the first features of the event stream to obtain second features of the blurred image and second features of the event stream;
and obtaining a clear image of a target moment according to the second characteristic of the blurred image and the second characteristic of the event stream, wherein the target moment is any moment in the exposure time.
2. The method of claim 1, wherein the depth feature extraction of the first feature of the blurred image and the first feature of the event stream to obtain the second feature of the blurred image and the second feature of the event stream comprises:
respectively obtaining a depth characteristic parameter of the blurred image and a depth characteristic parameter of the event stream according to the first characteristic of the blurred image and the first characteristic of the event stream;
and carrying out interactive processing on the depth characteristic parameters of the blurred image and the depth characteristic parameters of the event stream to obtain a second characteristic of the blurred image and a second characteristic of the event stream.
3. The method according to claim 1 or 2, wherein the obtaining a sharp image of a target time from the second feature of the blurred image and the second feature of the event stream, the target time being any time within the exposure time, comprises:
coding the target moment to obtain a time vector;
obtaining image features of fusion time information according to the time vector, the second features of the blurred image and the second features of the event stream;
and decoding the image characteristics of the fusion time information to obtain a clear image of the target moment.
4. An image processing apparatus characterized by comprising:
the first extraction module is used for respectively extracting features of a blurred image and an event stream to obtain a first feature of the blurred image and a first feature of the event stream, wherein the event stream is triggered by scene brightness change within exposure time corresponding to the blurred image;
the second extraction module is used for performing depth feature extraction on the first features of the blurred image and the first features of the event stream to obtain second features of the blurred image and second features of the event stream;
and the processing module is used for obtaining a clear image of a target moment according to the second characteristic of the blurred image and the second characteristic of the event stream, wherein the target moment is any moment in the exposure time.
5. The apparatus of claim 4, wherein the second extraction module is configured to:
respectively obtaining a depth characteristic parameter of the blurred image and a depth characteristic parameter of the event stream according to the first characteristic of the blurred image and the first characteristic of the event stream;
and carrying out interactive processing on the depth characteristic parameters of the blurred image and the depth characteristic parameters of the event stream to obtain a second characteristic of the blurred image and a second characteristic of the event stream.
6. The apparatus of claim 4 or 5, wherein the processing module is configured to:
coding the target moment to obtain a time vector;
obtaining image features of fusion time information according to the time vector, the second features of the blurred image and the second features of the event stream;
and decoding the image characteristics of the fusion time information to obtain a clear image of the target moment.
7. An image processing apparatus comprising a processor and a memory; wherein the memory is configured to store program code and the processor is configured to invoke the program code to perform the method of any of claims 1 to 3.
8. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which is executed by a processor to implement the method according to any one of claims 1 to 3.
9. A computer program product, characterized in that, when the computer program product is run on a computer, it causes the computer to perform the method according to any one of claims 1 to 3.
CN202210334494.1A 2022-03-31 2022-03-31 Image processing method, related device and system Pending CN114841870A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210334494.1A CN114841870A (en) 2022-03-31 2022-03-31 Image processing method, related device and system
PCT/CN2023/083859 WO2023185693A1 (en) 2022-03-31 2023-03-24 Image processing method, and related apparatus and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210334494.1A CN114841870A (en) 2022-03-31 2022-03-31 Image processing method, related device and system

Publications (1)

Publication Number Publication Date
CN114841870A true CN114841870A (en) 2022-08-02

Family

ID=82563863

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210334494.1A Pending CN114841870A (en) 2022-03-31 2022-03-31 Image processing method, related device and system

Country Status (2)

Country Link
CN (1) CN114841870A (en)
WO (1) WO2023185693A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023185693A1 (en) * 2022-03-31 2023-10-05 华为技术有限公司 Image processing method, and related apparatus and system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8390704B2 (en) * 2009-10-16 2013-03-05 Eastman Kodak Company Image deblurring using a spatial image prior
CN110060215B (en) * 2019-04-16 2021-09-10 深圳市商汤科技有限公司 Image processing method and device, electronic equipment and storage medium
CN111445414B (en) * 2020-03-27 2023-04-14 北京市商汤科技开发有限公司 Image processing method and device, electronic equipment and storage medium
CN112767277B (en) * 2021-01-27 2022-06-07 同济大学 Depth feature sequencing deblurring method based on reference image
CN114841870A (en) * 2022-03-31 2022-08-02 华为技术有限公司 Image processing method, related device and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023185693A1 (en) * 2022-03-31 2023-10-05 华为技术有限公司 Image processing method, and related apparatus and system

Also Published As

Publication number Publication date
WO2023185693A1 (en) 2023-10-05

Similar Documents

Publication Publication Date Title
US20200117906A1 (en) Space-time memory network for locating target object in video content
CN112001914A (en) Depth image completion method and device
CN112507990A (en) Video time-space feature learning and extracting method, device, equipment and storage medium
CN113592913B (en) Method for eliminating uncertainty of self-supervision three-dimensional reconstruction
CN109300151B (en) Image processing method and device and electronic equipment
CN111507262B (en) Method and apparatus for detecting living body
CN112801047B (en) Defect detection method and device, electronic equipment and readable storage medium
GB2579262A (en) Space-time memory network for locating target object in video content
CN113066034A (en) Face image restoration method and device, restoration model, medium and equipment
CN111626956A (en) Image deblurring method and device
CN113379601A (en) Real world image super-resolution method and system based on degradation variational self-encoder
CN113269722A (en) Training method for generating countermeasure network and high-resolution image reconstruction method
CN116205962B (en) Monocular depth estimation method and system based on complete context information
WO2023185693A1 (en) Image processing method, and related apparatus and system
CN113379606B (en) Face super-resolution method based on pre-training generation model
CN113689372A (en) Image processing method, apparatus, storage medium, and program product
CN111382647A (en) Picture processing method, device, equipment and storage medium
CN109816791B (en) Method and apparatus for generating information
CN117036442A (en) Robust monocular depth completion method, system and storage medium
TWI803243B (en) Method for expanding images, computer device and storage medium
Yang et al. Deep Convolutional Grid Warping Network for Joint Depth Map Upsampling
CN112995433B (en) Time sequence video generation method and device, computing equipment and storage medium
CN114842066A (en) Image depth recognition model training method, image depth recognition method and device
CN115049901A (en) Small target detection method and device based on feature map weighted attention fusion
CN114972016A (en) Image processing method, image processing apparatus, computer device, storage medium, and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination