CN109922372B

CN109922372B - Video data processing method and device, electronic equipment and storage medium

Info

Publication number: CN109922372B
Application number: CN201910142793.3A
Authority: CN
Inventors: 刘建博; 张佳维; 任思捷; 李鸿升; 王晓刚
Original assignee: Shenzhen Sensetime Technology Co Ltd
Current assignee: Shenzhen Sensetime Technology Co Ltd
Priority date: 2019-02-26
Filing date: 2019-02-26
Publication date: 2021-10-12
Anticipated expiration: 2039-02-26
Also published as: CN113766313A; CN109922372A; CN113766313B

Abstract

The present disclosure relates to a video data processing method and apparatus, an electronic device, and a storage medium, wherein the method includes: obtaining a first video data stream with a first frame rate; obtaining the time of the middle moment of two continuous frames of video data in the first video data stream, and obtaining the motion data of all events between the two continuous frames of video data according to the time; obtaining a second video data stream with a second frame rate according to the first video data stream and the motion data; the second frame rate is greater than the first frame rate. By adopting the method and the device, the video data stream with high frame rate can be conveniently and quickly obtained.

Description

Video data processing method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer vision technologies, and in particular, to a video data processing method and apparatus, an electronic device, and a storage medium.

Background

The high frame rate video production problem is a fundamental problem in the field of computer vision. High frame rate video has important application value in many application scenes. For example, the recognition and analysis of fast motion scene actions, high frame rate video playing software, ultra-slow motion playing in short video applications, professional sports player technical detail analysis, etc., in these application scenes, the video data stream directly depending on the high frame rate can be realized by a professional high frame rate camera. The problem to be solved is how to conveniently and rapidly obtain the video data stream with the high frame rate by using a professional high frame rate camera, which is high in cost and inconvenient to operate.

Disclosure of Invention

The present disclosure provides a video data processing technical solution.

According to an aspect of the present disclosure, there is provided a video data processing method including:

obtaining a first video data stream with a first frame rate;

obtaining the time of the middle moment of two continuous frames of video data in the first video data stream, and obtaining the motion data of all events between the two continuous frames of video data according to the time;

obtaining a second video data stream with a second frame rate according to the first video data stream and the motion data;

the second frame rate is greater than the first frame rate.

In a possible implementation manner of the present disclosure, the obtaining a time of an intermediate time between two consecutive frames of video data in the first video data stream includes:

analyzing two continuous frames of video images and time stamps respectively corresponding to the two frames of video images from the first video data stream;

and obtaining the time according to the time stamps respectively corresponding to the two frames of video images.

In a possible implementation manner of the present disclosure, the obtaining motion data of all events between two consecutive frames of video data according to the time includes:

obtaining the light intensity variation of the motion light flow in the time-corresponding motion scene;

and obtaining the motion data of the corresponding event according to the light intensity change condition, and recording the motion data of all events.

In a possible implementation manner of the present disclosure, the motion data includes: a time when the event occurs, a location when the event occurs, and/or an attribute when the event occurs;

and the attribute of the event is used for representing the attribute of pixel brightness increase or pixel brightness decrease in the image at the middle moment of the two frames of video images.

In a possible implementation manner of the present disclosure, the obtaining a second video data stream at a second frame rate according to the first video data stream and the motion data includes:

obtaining every two continuous video images from the multi-frame video images in the first video data stream;

obtaining corresponding event images of the middle time of every two continuous frames of video images from a plurality of event images in the motion data;

obtaining the relative motion state of the middle moment of each two frames of video images according to each two frames of continuous video images and the corresponding event images;

and performing video frame interpolation processing of an intermediate frame on each two frames of video images according to the relative motion state of the intermediate time of each two frames of video images to obtain the second video data stream.

In a possible implementation manner of the present disclosure, the performing, according to the relative motion state of the middle time of each two frames of video images, video frame interpolation processing on the middle frame of each two frames of video images to obtain the second video data stream includes:

and according to the relative motion state of the middle time of each two frames of video images, performing convolution and summation on each two frames of video images to obtain the middle frame image inserted into the middle time of each two frames of video images.

In a possible implementation manner of the present disclosure, the method further includes:

and after the intermediate frame image is inserted into the intermediate time of every two frames of video images, forming a new video stream by using every two frames of video images and the intermediate frame image, and determining the new video stream as the second video data stream.

inputting the first video data stream and the motion data into a video interpolation processing network;

and processing the first video data stream and the motion data through the video frame interpolation processing network, and outputting to obtain a second video data stream with a second frame rate.

In a possible implementation manner of the present disclosure, in a case that the first video data stream is composed of a first video image sequence, and the motion data is composed of an event image sequence, the processing the first video data stream and the motion data by the video frame insertion processing network, and outputting a second video data stream with a second frame rate includes:

obtaining video image data of two continuous frames from the first video image sequence;

obtaining a plurality of event image data corresponding to the video image data of the two continuous frames from the event image sequence;

respectively extracting the characteristics of the video image data of the two continuous frames and the event image data to obtain characteristic extraction results;

obtaining a corresponding convolution kernel according to the feature extraction result;

and respectively carrying out convolution processing and summation on the video image data of the two continuous frames according to the convolution core to obtain intermediate frame data inserted into the video image data of the two continuous frames.

In a possible implementation manner of the present disclosure, the obtaining a corresponding convolution kernel according to the feature extraction result includes:

and obtaining the relative motion state of the video image data of the two continuous frames at the middle moment according to the feature extraction result, and storing the relative motion state in a form that each pixel corresponds to one convolution kernel to obtain a plurality of convolution kernels.

In a possible implementation manner of the present disclosure, the performing convolution processing and summing on the video image data of the two consecutive frames according to the convolution kernel to obtain intermediate frame data inserted into the video image data of the two consecutive frames includes:

under the condition that the video image data of the two continuous frames are first video image data and second video image data, performing convolution processing on the first video image data by using a first convolution core to obtain a first processing result, and performing convolution processing on the second video image data by using a second convolution core to obtain a second processing result;

and summing the first processing result and the second processing result to obtain the intermediate frame data.

In a possible implementation manner of the present disclosure, the event image data is obtained by a video frame number division rule based on a middle time of the video image data.

According to an aspect of the present disclosure, there is provided a video data processing apparatus, the apparatus including:

the first data stream obtaining module is used for obtaining a first video data stream with a first frame rate;

the motion data acquisition module is used for acquiring the time of the middle moment of two continuous frames of video data in the first video data stream and acquiring the motion data of all events between the two continuous frames of video data according to the time;

the second data stream obtaining module is used for obtaining a second video data stream at a second frame rate according to the first video data stream and the motion data;

the second frame rate is greater than the first frame rate.

In a possible implementation manner of the present disclosure, the motion data obtaining module is further configured to:

In a possible implementation manner of the present disclosure, the second data stream obtaining module includes:

the first obtaining sub-module is used for obtaining every two continuous frames of video images from the multi-frame video images in the first video data stream;

the second obtaining submodule is used for obtaining corresponding event images of the middle moment of each two frames of continuous video images from a plurality of event images in the motion data;

the third obtaining submodule is used for obtaining the relative motion state of the middle moment of each two frames of video images according to each two frames of continuous video images and the corresponding event images;

and the frame interpolation processing submodule is used for carrying out video frame interpolation processing on each two frames of video images according to the relative motion state of the middle moment of each two frames of video images to obtain the second video data stream.

In a possible implementation manner of the present disclosure, the frame insertion processing sub-module is further configured to:

In a possible implementation manner of the present disclosure, the apparatus further includes:

and the new data stream processing module is used for forming a new video stream by the two frames of video images and the intermediate frame image after the intermediate frame image is inserted into the intermediate time of the two frames of video images, and determining the new video stream as the second video data stream.

a first processing sub-module for inputting the first video data stream and the motion data into a video interpolation processing network;

and the second processing sub-module is used for processing the first video data stream and the motion data through the video frame insertion processing network and outputting a second video data stream with a second frame rate.

In a possible implementation manner of the present disclosure, in a case that the first video data stream is composed of a first video image sequence, and the motion data is composed of an event image sequence, the second processing sub-module includes:

a first image data obtaining unit configured to obtain video image data of two consecutive frames from the first video image sequence;

a second image data obtaining unit configured to obtain, from the event image sequence, a plurality of event image data corresponding to the video image data of the two consecutive frames;

the characteristic extraction unit is used for respectively extracting the characteristics of the video image data of the two continuous frames and the event image data to obtain a characteristic extraction result;

a convolution obtaining unit, configured to obtain a corresponding convolution kernel according to the feature extraction result;

and the convolution processing unit is used for performing convolution processing on the video image data of the two continuous frames respectively according to the convolution core and summing the convolution processing to obtain intermediate frame data inserted into the video image data of the two continuous frames.

In a possible implementation manner of the present disclosure, the convolution obtaining unit is further configured to:

In a possible implementation manner of the present disclosure, the convolution processing unit is further configured to:

According to an aspect of the present disclosure, there is provided an electronic device including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to: the above video data processing method is performed.

According to an aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described video data processing method.

In the embodiment of the disclosure, a first video data stream of a first frame rate is obtained; obtaining the time of the middle moment of two continuous frames of video data in the first video data stream, and obtaining the motion data of all events between the two continuous frames of video data according to the time; obtaining a second video data stream with a second frame rate according to the first video data stream and the motion data; the second frame rate is greater than the first frame rate. By adopting the method and the device, the motion data of all events between two continuous frames of video data can be obtained through the time of the middle moment of the two continuous frames of video data in the first video data stream (low frame rate video data stream), and the second video data stream (high frame rate video data stream) of the second frame rate can be obtained according to the first video data stream (low frame rate video data stream) and the motion data, so that the video data stream of the high frame rate can be conveniently and quickly obtained without using a professional high frame rate camera, the cost is reduced, and the operation is convenient and fast.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 shows a flow chart of a video data processing method according to an embodiment of the present disclosure.

Fig. 2 shows a flow chart of a video data processing method according to an embodiment of the present disclosure.

Fig. 3 shows a flow chart of a video data processing method according to an embodiment of the present disclosure.

Fig. 4 is a schematic diagram of a network structure of a video processing method according to an embodiment of the present disclosure.

Fig. 5 shows a block diagram of a video data processing apparatus according to an embodiment of the present disclosure.

Fig. 6 shows a block diagram of a video data processing apparatus according to an embodiment of the present disclosure.

Fig. 7 shows a block diagram of an electronic device according to an embodiment of the disclosure.

Fig. 8 illustrates a block diagram of an electronic device in accordance with an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

In application scenes such as identification and analysis of fast motion scene actions, video playing software with a high frame rate, ultra-slow motion playing in short video application, technical detail analysis of professional sports athletes and the like, the acquisition accuracy of video images depends on whether video data with a high frame rate can be obtained or not. For how to acquire the video data with the high frame rate, the video data can be directly acquired by a professional high frame rate camera, or indirectly acquired from the video data with the low frame rate by the professional high frame rate camera, for example, the video data with the high frame rate is generated by a frame interpolation method for the video data with the low frame rate. However, the professional high frame rate camera has the disadvantages of high hardware cost and high power consumption, and therefore, how to conveniently and quickly perform video frame interpolation from the low frame rate video data to generate the high frame rate video data is very important.

Driven by the development of the convolutional neural network technology, the related technology of video frame interpolation is rapidly developed. The video frame interpolation method based on the convolutional neural network mainly comprises two steps from the technical realization point of view: firstly, the motion optical flow is analyzed from the low frame rate video data, and then the motion optical flow in the two frames of video data is utilized to calculate the intermediate frame by some interpolation method. While the quality of video interpolation is substantially dependent on the quality of the estimated motion light flow. However, in a fast motion scene, a motion optical flow estimation algorithm in the related art is difficult to achieve very well, and particularly, in a video frame interpolation method based on a convolutional neural network, since a motion state between two consecutive frames of video data may exist in low-frame-rate input video data, especially for a fast motion scene, a motion estimation of two consecutive frames of video data in a low-frame-rate input video data has a great challenge, and it is difficult to accurately estimate a motion state between two consecutive frames of video data, in one way: the motion optical flow is estimated based on the input image sequence with low frame rate, and only the linear motion optical flow between two time points can be estimated, so that the high-quality video frame interpolation effect is difficult to achieve for the non-linear motion process. In another mode: for a fast moving object, due to the fact that long exposure time exists in the low frame rate video data in the acquisition process, the video data has the problem of ambiguity caused by motion blur, and the real motion state of the moving object is difficult to estimate with high quality based on the motion blur video data, so that the final video frame interpolation effect is influenced, namely, the frame interpolation processing cannot generate the high frame rate video data conveniently and quickly.

The video frame interpolation processing is completed by combining the application layer (event camera) at the algorithm level. The event camera can more comprehensively acquire all motion data between two frames of a low-frame-rate common camera in a fast motion scene, so that the uncertainty of the predicted motion light stream can be reduced to a certain degree, the accuracy of frame interpolation is improved, and the precision of image synthesis is finally improved. For an event camera (event camera of a sensor), achieving a higher quality video frame interpolation effect in a fast running scene can be assisted by the event camera. The event camera is different from a traditional photosensitive camera, and can carry out light intensity accumulation by fixed time to generate video data (such as video image data) which is used for frame interpolation processing, so that a more accurate frame interpolation effect is achieved. For the frame interpolation processing, the high frame rate video data stream can be obtained according to the low frame rate video data stream and the motion data of the middle moment of the two frames of video data collected by the event camera. Since the high frame rate video data stream needs to be obtained by means of frame interpolation, the more accurate the frame interpolation, the higher the frame rate of the video data stream.

Fig. 1 shows a flowchart of a video data processing method according to an embodiment of the present disclosure, which is applied to a video data processing apparatus, for example, the video data processing apparatus may be executed by a terminal device or a server or other processing device, wherein the terminal device may be a User Equipment (UE), a mobile device, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like. In some possible implementations, the video data processing may be implemented by a processor calling computer readable instructions stored in a memory. As shown in fig. 1, the process includes:

step S101, a first video data stream with a first frame rate is obtained.

The first video data stream at the first frame rate may be a low frame rate video data stream. Frame rate is the frequency (or rate) at which bitmap images, called frames, appear continuously on the display. For example, each second has a number of frames, a frame is a single image frame of the minimum unit in the image motion picture, a frame is a still image frame, and the continuous frames form the motion picture, such as a television image. The higher the frame number, the smoother the picture.

In one example, video data in a video data stream may exist in the form of video frames, thereby forming a continuous sequence of video frames.

And S102, obtaining the time of the middle moment of two continuous frames of video data in the first video data stream, and obtaining the motion data of all events between the two continuous frames of video data according to the time.

In one example, data may be collected by an event camera, i.e. data collected according to a time between two consecutive frames of video data, resulting in motion data for all events.

Step S103, according to the first video data stream and the motion data, a second video data stream with a second frame rate is obtained, wherein the second frame rate is larger than the first frame rate.

The second video data stream at the second frame rate may be a high frame rate video data stream. The higher the frame number, the smoother the picture.

In one example, a generated image at an intermediate time between two frames of video images (e.g., an image corresponding to the intermediate time) is obtained according to two consecutive frames of video images (e.g., based on a low frame rate video image sequence) and data of an event camera between the two frames of video images (e.g., motion data corresponding to a scene), and the image substantially contains motion data of all events between the two frames of video images and is used as the intermediate frame for frame interpolation processing, so as to obtain a high frame rate video data stream.

With the present disclosure, video interpolation is accomplished jointly at the algorithm level (interpolation processing) in conjunction with the application layer (event camera). The motion data of all events between two continuous frames of video data can be obtained through the time of the middle moment of two continuous frames of video data in the first video data stream (low frame rate video data stream), and the second video data stream (high frame rate video data stream) of the second frame rate can be obtained according to the first video data stream (low frame rate video data stream) and the motion data, so that the video data stream of the high frame rate can be conveniently and quickly obtained, a professional high frame rate camera does not need to be used, the cost is reduced, and the operation is convenient and fast.

In a possible implementation manner of the present disclosure, obtaining a time of an intermediate time between two consecutive frames of video data in the first video data stream includes: analyzing two continuous frames of video images and time stamps respectively corresponding to the two frames of video images from the first video data stream, and obtaining the time of the middle moment of the two continuous frames of video data according to the time stamps respectively corresponding to the two frames of video images.

In a possible implementation manner of the present disclosure, obtaining motion data of all events between two consecutive frames of video data according to time includes: and obtaining the light intensity change condition of the motion light stream in the motion scene corresponding to the time, obtaining the motion data of the corresponding event according to the light intensity change condition, and recording the motion data of all the events. Wherein the motion data comprises: a time when the event occurred, a location when the event occurred, and/or an attribute when the event occurred.

Because the brightness mode of the corresponding pixel point on the image of the object moves when the object moves, the moving optical flow algorithm obtained based on the principle can analyze the image of the moving image. The optical flow expresses the change of the image, and can be used for determining the motion situation of the target object because the optical flow contains the information of the motion of the target object on the image. For a possible implementation of the present disclosure, the attribute of the event occurrence is used to characterize the pixel brightness increase or the pixel brightness decrease in the image at the middle time of the two frames of video images.

In one example, two consecutive frames of video images may be a first video image and a second video image, and the corresponding timestamps are a first timestamp and a second timestamp, respectively, so that the time of the middle time of the two consecutive frames of video data may be obtained according to the first timestamp and the second timestamp. And obtaining the light intensity change condition in the motion scene corresponding to the time, and obtaining the motion data of the corresponding event according to the light intensity change condition. And acquiring the motion data of different events corresponding to different light intensity change conditions, and recording the acquired motion data of a plurality of events to obtain the motion data of all events. Wherein the motion data comprises: a time when the event occurred, a location when the event occurred, and/or an attribute when the event occurred. As for the attribute at the time of the event, the attribute at the time of the event is used to characterize the attribute of the pixel brightness increase or the pixel brightness decrease in the image at the intermediate time point of the two frame video images.

The above process can be acquired by an event camera, which is different from a conventional light sensing camera and technically realizes that light intensity accumulation is carried out by fixed time to generate an image. The event camera only captures events in the scene where the intensity of the light has changed, and can asynchronously record the time when the event occurred, the location where the event occurred, and attribute information (such as attributes of increased pixel brightness or decreased pixel brightness) when the event occurred. The event camera has ultrahigh time resolution and low power consumption, and can still exert normal working advantages in bright or dark scenes. The defects of the traditional low frame rate camera can be overcome by the advantage of the ultrahigh time resolution of the event camera, the resolution of the video image is not required to be improved by a professional high frame rate camera, and the high definition resolution can still be achieved. Meanwhile, by combining the interpolation frame processing based on the motion optical flow algorithm, the video images of high frames or ultrahigh frames can be obtained according to the common low-frame video images, and the method is suitable for application scenes such as identification and analysis of fast motion scene actions, video playing software with high frame rate, super-slow motion playing in short video application, technical detail analysis of professional sports athletes and the like.

Fig. 2 shows a flowchart of a video data processing method according to an embodiment of the present disclosure, which is applied to a video data processing apparatus, for example, the video data processing apparatus may be executed by a terminal device or a server or other processing device, wherein the terminal device may be a User Equipment (UE), a mobile device, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like. In some possible implementations, the video data processing may be implemented by a processor calling computer readable instructions stored in a memory. As shown in fig. 2, the process includes:

step S201, a first video data stream with a first frame rate is obtained.

The first video data stream at the first frame rate may be a low frame rate video data stream. Frame rate is the frequency (or rate) at which bitmap images, called frames, appear continuously on the display. The higher the frame number, the smoother the picture.

Step S202, obtaining the time of the middle moment of the continuous two frames of video data in the first video data stream, and obtaining the motion data of all events between the continuous two frames of video data according to the time.

In one example, data may be collected by an event camera, that is, data collected according to a time between two consecutive frames of video data, so as to obtain motion data (or referred to as motion data or event motion data) of all events.

Step S203, obtaining every two continuous frames of video images from the multi-frame video images in the first video data stream.

Step S204, obtaining corresponding event images of the middle time of every two continuous video images from a plurality of event images in the motion data.

And S205, obtaining the relative motion state of the middle moment of each two frames of video images according to each two frames of continuous video images and the corresponding event images.

In one example, from two consecutive frames of images and event data between the two frames, a motion process within the time period is estimated, and a relative motion state equivalent to an intermediate time of the two frames of images is generated.

And S206, performing video frame interpolation processing of an intermediate frame on each two frames of video images according to the relative motion state of the intermediate time of each two frames of video images to obtain a second video data stream.

In one example, the final interpolated frame image can be obtained by convolving and summing the two input frame images with the relative motion state at the intermediate time.

Through the steps S203 to S206, a second video data stream with a second frame rate can be obtained according to the first video data stream and the motion data, and the second frame rate is greater than the first frame rate and can be a high frame rate video data stream. The higher the frame number, the smoother the picture.

In a possible implementation manner of the present disclosure, performing video frame interpolation processing on each two frames of video images according to a relative motion state of each two frames of video images at an intermediate time to obtain the second video data stream includes: and according to the relative motion state of the middle moment of each two frames of video images, performing convolution and summation on each two frames of video images to obtain a middle frame image (final frame interpolation image) inserted into the middle moment of each two frames of video images.

In one possible implementation manner of the present disclosure, the video data processing method further includes: after the intermediate frame image is inserted into the intermediate time of every two frames of video images, a new video stream is formed by every two frames of video images and the intermediate frame image, and the new video stream is determined as a second video data stream.

Fig. 3 shows a flowchart of a video data processing method according to an embodiment of the present disclosure, which is applied to a video data processing apparatus, for example, the video data processing apparatus may be executed by a terminal device or a server or other processing device, wherein the terminal device may be a User Equipment (UE), a mobile device, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like. In some possible implementations, the video data processing may be implemented by a processor calling computer readable instructions stored in a memory. As shown in fig. 3, obtaining a second video data stream at a second frame rate according to the first video data stream and the motion data includes:

step S301, inputting the first video data stream and the motion data into a video frame insertion processing network.

Step S302, the first video data stream and the motion data are processed through a video frame insertion processing network, and a second video data stream with a second frame rate is output.

In one example, the video frame interpolation processing network is a fully-connected network structure, and the video data with the low frame rate and the event data of the corresponding scene are input into the fully-connected network structure, so that the frame interpolation processing of the video data is realized, and the high-precision video data stream is obtained.

Through the above steps S301 to S302, the first video data stream (the video data with the low frame rate) and the event data acquired by the event camera from the corresponding scene are input into the full-connection network structure, so as to implement the frame interpolation processing of the video data, and the obtained second video data stream is the high-precision video data stream (the video data with the high frame rate). The method realizes the purpose of finally obtaining the video data stream with the high frame rate from the video data stream with the low frame rate through accurate frame interpolation processing.

The present disclosure does not limit the specific structure of the fully-connected network, and any network for implementing high-precision video frame interpolation is within the protection scope of the present disclosure as long as the fully-connected network structure takes two consecutive frames of video images and event data in the time period as input. The event camera can more comprehensively acquire all motion data between two frames of video images of the low-frame-rate common camera in a fast motion scene, and an intermediate frame image is synthesized from two frames of real images based on the motion data.

In a possible implementation manner of the present disclosure, in a case that the first video data stream is composed of a first video image sequence, and the motion data is composed of an event image sequence, the first video data stream and the motion data are processed by a video frame insertion processing network, and a second video data stream at a second frame rate is output, where the method includes: video image data of two consecutive frames are obtained from the first video image sequence, and a plurality of event image data corresponding to the video image data of two consecutive frames are obtained from the event image sequence. And respectively extracting the characteristics of the video image data of two continuous frames and the event image data to obtain a characteristic extraction result. And obtaining corresponding convolution kernels according to the feature extraction result, and performing convolution processing and summation on the video image data of the two continuous frames according to the convolution kernels to obtain intermediate frame data inserted into the video image data of the two continuous frames. The event image data are obtained by dividing rules based on the number of video frames of the video image data at the middle moment. In one example, the motion data of the event camera in the network input is divided equally by 6 time periods according to the time between two frames of video images (time constituted by the intermediate interval between two frames of video images), and then the event image data of 6 frames is cumulatively generated. The 6 frames of event image data contain substantially all of the motion data between the two frames of video images.

In one possible implementation manner of the present disclosure, obtaining a corresponding convolution kernel according to a feature extraction result includes: and according to the feature extraction result, obtaining the relative motion state of the video image data of two continuous frames at the middle moment, and storing the relative motion state in a mode that each pixel corresponds to one convolution kernel to obtain a plurality of convolution kernels.

In a possible implementation manner of the present disclosure, the performing convolution processing and summing on the video image data of the two consecutive frames respectively according to the convolution kernel to obtain intermediate frame data inserted into the video image data of the two consecutive frames includes: and under the condition that the video image data of two continuous frames are the first video image data and the second video image data, performing convolution processing on the first video image data by using the first convolution kernel to obtain a first processing result, and performing convolution processing on the second video image data by using the second convolution kernel to obtain a second processing result. And summing the first processing result and the second processing result to obtain intermediate frame data (intermediate frame image).

In one example, in the image generation stage, the input two frames of images are respectively convolved and summed by using a convolution kernel representing the relative motion state according to the following formula (1), so as to obtain the final generated image. Wherein the content of the first and second substances,

representing the intermediate frame data inserted in two consecutive frames of video images; p_k(x, y) and P_k+2(x, y) represents video image data of two consecutive frames, P_k(x, y) is first frame image data, P_k+2(x, y) is second frame image data; k₁(x, y) denotes a convolution kernel, K, corresponding to the first frame image data₂(x, y) denotes a convolution kernel corresponding to the second frame image data.

In one example, an event camera is employed as an auxiliary device, and video data at a low frame rate and event data of a corresponding scene are simultaneously acquired. And constructing a video frame interpolation processing network which takes the continuous two frames of video images and the motion data of all events in the time period of the middle moment of the two frames of video images as input data, wherein the network can be a full-connection network structure. High-precision video interpolation is realized by processing the input data through a full-connection network structure. The network consists of two parts:

a first part: and estimating the motion process in the time period between the two frames of video images in the middle moment from the motion data of any two continuous frames of images and all events between the two frames obtained from the first video data stream. The relative motion state equivalent to the middle time of the two frames of video images can be generated according to the motion process in the time period of the middle time of the two frames of video images.

A second part: according to the relative motion state of the middle time of two frames of video images, the input two frames of video images are convoluted and summed to obtain the final frame-inserted image, according to the video frame-inserting mode of the convolution summation, the corresponding middle frame is inserted into the middle time of each two frames of continuous video images, after the middle frame is inserted, a new video stream, namely a second video data stream, is formed by each two frames of continuous video images and the corresponding middle frame inserted into the middle time.

With the present example, the first video data stream may be a low frame rate video image sequence, the event camera may collect motion data of all events corresponding to a scene, estimate motion data of an intermediate time by using two consecutive video images and all direct motion data of the two video images, and then generate an intermediate frame image by using the motion data, so as to complete a frame interpolation task of a high frame rate video.

The method and the device for realizing the frame interpolation task of the high-frame-rate video can be applied to intelligent video action analysis. And performing high-precision slow motion video reconstruction by using the moving target in the moving scene, such as the motion posture of an athlete, and further analyzing whether the motion is standard or not through the estimated ultra-slow motion video. But also to the generation of high frame rate video sources. The related art ultra-high frame rate camera has the problems of high equipment cost and high power consumption, and particularly, the ultra-high frame rate camera chip is difficult to popularize on a mobile terminal. The event camera has the advantage of ultra-low power consumption, so that the video generation and acquisition with high precision and high frame rate can be realized by using the method and the device based on the existing low frame rate camera and the low power consumption event camera.

Application example:

fig. 4 is a schematic diagram of a network structure of a video processing method according to an embodiment of the disclosure, and as shown in fig. 4, based on the network structure and motion data collected by an event camera, a video processing (video frame insertion processing) flow of the disclosure is described as follows:

inputting: images of two consecutive frames, and motion data of the event camera between the two frames of images.

And (3) outputting: and generating an image at the middle moment of the two frames, namely an intermediate frame image.

1. The video frame interpolation algorithm network structure based on the event camera is a full-connection network structure and mainly comprises an encoding module (Encoder), a decoding module (Decoder) and an image synthesis module, and intermediate frame images corresponding to high-quality intermediate moments are generated through the encoding module (Encoder), the decoding module (Decoder) and the image synthesis module. Where encoding is the process of converting a set of characters into a sequence of bytes. Decoding is a reverse operation, i.e., converting a sequence of encoded bytes into a set of characters.

2. The motion data of the event camera in the network input is divided equally by 6 time periods according to the time between two frames of images, and then 6 frames of event images are cumulatively generated, so that the 6 frames of event images contain substantially all the motion data between the two frames of images.

3. The Encoder module and the Decoder module are used for carrying out feature extraction and analysis on the input two frames of images and the motion data of events between the two frames of images, finally, the motion state of the middle moment of the two frames of images can be accurately estimated, and the relative motion state is stored in a form that each pixel corresponds to one convolution kernel.

4. In the image generation stage, the convolution kernels which represent the relative motion states and are estimated above are utilized to respectively convolve and sum the two input frame images to obtain the final generated image.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

The above-mentioned method embodiments can be combined with each other to form a combined embodiment without departing from the principle logic, which is limited by the space and will not be repeated in this disclosure.

In addition, the present disclosure also provides a video data processing apparatus, an electronic device, a computer-readable storage medium, and a program, which can be used to implement any video data processing method provided by the present disclosure, and the corresponding technical solutions and descriptions and corresponding descriptions in the methods section are not repeated.

Fig. 5 shows a block diagram of a video data processing apparatus according to an embodiment of the present disclosure, and as shown in fig. 5, the video data processing apparatus of an embodiment of the present disclosure includes: a first data stream obtaining module 21, configured to obtain a first video data stream at a first frame rate. And the motion data obtaining module 22 is configured to obtain time of a middle moment between two consecutive frames of video data in the first video data stream, and obtain motion data of all events between the two consecutive frames of video data according to the time. The second data stream obtaining module 23 is configured to obtain a second video data stream at a second frame rate according to the first video data stream and the motion data, where the second frame rate is greater than the first frame rate.

In a possible implementation manner of the present disclosure, the motion data obtaining module is further configured to: and analyzing two continuous frames of video images and time stamps respectively corresponding to the two frames of video images from the first video data stream. And obtaining the time according to the time stamps respectively corresponding to the two frames of video images.

In a possible implementation manner of the present disclosure, the motion data obtaining module is further configured to: and obtaining the light intensity variation of the motion light flow in the time corresponding motion scene. And obtaining the motion data of the corresponding event according to the light intensity change condition, and recording the motion data of all events.

In a possible implementation manner of the present disclosure, the motion data includes: a time when the event occurred, a location when the event occurred, and/or an attribute when the event occurred. And the attribute of the event is used for representing the attribute of pixel brightness increase or pixel brightness decrease in the image at the middle moment of the two frames of video images.

Fig. 6 shows a block diagram of a video data processing apparatus according to an embodiment of the present disclosure, and as shown in fig. 6, the video data processing apparatus of an embodiment of the present disclosure includes: a first data stream obtaining module 21, configured to obtain a first video data stream at a first frame rate. And the motion data obtaining module 22 is configured to obtain time of a middle moment between two consecutive frames of video data in the first video data stream, and obtain motion data of all events between the two consecutive frames of video data according to the time. The second data stream obtaining module 23 is configured to obtain a second video data stream at a second frame rate according to the first video data stream and the motion data, where the second frame rate is greater than the first frame rate. Wherein, the second data stream obtaining module 23 includes: the first obtaining sub-module 231 is configured to obtain every two consecutive frames of video images from the multiple frames of video images in the first video data stream. The second obtaining sub-module 232 is configured to obtain, from the event images in the motion data, a corresponding event image at a middle time of each two consecutive frames of video images. And a third obtaining submodule 233, configured to obtain a relative motion state at the middle time of each two frames of video images according to each two frames of continuous video images and the corresponding event image. And the frame interpolation processing submodule 234 is configured to perform video frame interpolation processing on an intermediate frame for each two frames of video images according to the relative motion state of the intermediate time of each two frames of video images, so as to obtain the second video data stream.

In a possible implementation manner of the present disclosure, the frame insertion processing sub-module is further configured to: and according to the relative motion state of the middle time of each two frames of video images, performing convolution and summation on each two frames of video images to obtain the middle frame image inserted into the middle time of each two frames of video images.

In a possible implementation manner of the present disclosure, the apparatus further includes: and the new data stream processing module is used for forming a new video stream by the two frames of video images and the intermediate frame image after the intermediate frame image is inserted into the intermediate time of the two frames of video images, and determining the new video stream as the second video data stream.

In a possible implementation manner of the present disclosure, the second data stream obtaining module includes: and the first processing sub-module is used for inputting the first video data stream and the motion data into a video frame interpolation processing network. And the second processing sub-module is used for processing the first video data stream and the motion data through the video frame insertion processing network and outputting a second video data stream with a second frame rate.

In a possible implementation manner of the present disclosure, in a case that the first video data stream is composed of a first video image sequence, and the motion data is composed of an event image sequence, the second processing sub-module includes: a first image data obtaining unit for obtaining video image data of two consecutive frames from the first video image sequence. And the second image data obtaining unit is used for obtaining a plurality of event image data corresponding to the video image data of the two continuous frames from the event image sequence. And the feature extraction unit is used for respectively extracting features of the video image data of the two continuous frames and the event image data to obtain a feature extraction result. And the convolution obtaining unit is used for obtaining a corresponding convolution kernel according to the feature extraction result. And the convolution processing unit is used for performing convolution processing on the video image data of the two continuous frames respectively according to the convolution core and summing the convolution processing to obtain intermediate frame data inserted into the video image data of the two continuous frames. The event image data are obtained by a video frame number dividing rule based on the middle time of the video image data.

In a possible implementation manner of the present disclosure, the convolution obtaining unit is further configured to: and obtaining the relative motion state of the video image data of the two continuous frames at the middle moment according to the feature extraction result, and storing the relative motion state in a form that each pixel corresponds to one convolution kernel to obtain a plurality of convolution kernels.

In a possible implementation manner of the present disclosure, the convolution processing unit is further configured to: and under the condition that the video image data of two continuous frames are the first video image data and the second video image data, performing convolution processing on the first video image data by using the first convolution kernel to obtain a first processing result, and performing convolution processing on the second video image data by using the second convolution kernel to obtain a second processing result. And summing the first processing result and the second processing result to obtain intermediate frame data.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-mentioned method. The computer readable storage medium may be a non-volatile computer readable storage medium.

An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured as the above method.

The electronic device may be provided as a terminal, server, or other form of device.

Fig. 7 is a block diagram illustrating an electronic device 800 in accordance with an example embodiment. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like terminal.

Referring to fig. 7, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.

Fig. 8 is a block diagram illustrating an electronic device 900 in accordance with an example embodiment. For example, the electronic device 900 may be provided as a server. Referring to fig. 8, electronic device 900 includes a processing component 922, which further includes one or more processors, and memory resources, represented by memory 932, for storing instructions, such as applications, that are executable by processing component 922. The application programs stored in memory 932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 922 is configured to execute instructions to perform the above-described methods.

The electronic device 900 may also include a power component 926 configured to perform power management of the electronic device 900, a wired or wireless network interface 950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 958. The electronic device 900 may operate based on an operating system stored in the memory 932, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 932, is also provided that includes computer program instructions executable by the processing component 922 of the electronic device 900 to perform the above-described method.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or technical improvements to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method of processing video data, comprising:

obtaining a first video data stream with a first frame rate;

the second frame rate is greater than the first frame rate;

obtaining a second video data stream at a second frame rate according to the first video data stream and the motion data, including:

2. The method of claim 1, wherein obtaining the time between the time instants of two consecutive frames of video data in the first video data stream comprises:

3. The method of claim 1, wherein obtaining motion data for all events between two consecutive frames of video data according to the time comprises:

4. The method of claim 3, wherein the motion data comprises: a time when the event occurs, a location when the event occurs, and/or an attribute when the event occurs;

5. The method according to claim 1, wherein the performing video frame interpolation processing on each two frames of video images according to the relative motion state of each two frames of video images at the intermediate time to obtain the second video data stream comprises:

6. The method of claim 5, further comprising:

7. The method according to any one of claims 1 to 6, wherein obtaining a second video data stream at a second frame rate according to the first video data stream and the motion data comprises:

8. The method of claim 7, wherein the processing of the first video data stream and the motion data through the video frame insertion processing network to output a second video data stream with a second frame rate, in the case that the first video data stream is composed of a first video image sequence and the motion data is composed of an event image sequence, comprises:

performing convolution processing on the video image data of the two continuous frames respectively according to the convolution core and summing the convolution processing to obtain intermediate frame data inserted into the video image data of the two continuous frames;

wherein the obtaining of the corresponding convolution kernel according to the feature extraction result includes:

9. The method according to claim 8, wherein the performing convolution processing and summing on the video image data of the two consecutive frames according to the convolution kernel to obtain intermediate frame data inserted into the video image data of the two consecutive frames comprises:

10. The method of claim 8, wherein the plurality of event image data are derived from a video frame number partitioning rule based on a time instant in the video image data.

11. A video data processing apparatus, characterized in that the apparatus comprises:

the second frame rate is greater than the first frame rate;

wherein the second data stream obtaining module includes:

12. The apparatus of claim 11, wherein the motion data obtaining module is further configured to:

13. The apparatus of claim 11, wherein the motion data obtaining module is further configured to:

14. The apparatus of claim 13, wherein the motion data comprises: a time when the event occurs, a location when the event occurs, and/or an attribute when the event occurs;

15. The apparatus of claim 11, wherein the frame insertion processing sub-module is further configured to:

16. The apparatus of claim 15, further comprising:

17. The apparatus according to any one of claims 11 to 16, wherein the second data stream obtaining module comprises:

18. The apparatus according to claim 17, wherein the first video data stream is constituted by a first sequence of video images, and wherein the second processing sub-module, in the case where the motion data is constituted by a sequence of event images, comprises:

the convolution processing unit is used for performing convolution processing on the video image data of the two continuous frames respectively according to the convolution core and summing the convolution processing to obtain intermediate frame data inserted into the video image data of the two continuous frames;

wherein the convolution obtaining unit is further configured to:

19. The apparatus of claim 18, wherein the convolution processing unit is further configured to:

20. The apparatus of claim 18, wherein the plurality of event image data are derived from a video frame number partitioning rule based on a time instant in the video image data.

21. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to: performing the method of any one of claims 1 to 10.

22. A computer readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1 to 10.