CN112511859A

CN112511859A - Video processing method, device and storage medium

Info

Publication number: CN112511859A
Application number: CN202011260544.3A
Authority: CN
Inventors: 徐锐
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2020-11-12
Filing date: 2020-11-12
Publication date: 2021-03-16
Anticipated expiration: 2040-11-12
Also published as: CN112511859B

Abstract

The invention discloses a video processing method, a video processing device and a storage medium, wherein the method comprises the following steps: acquiring first video data; the first video data comprises at least two frames of first images; acquiring at least one frame of second image corresponding to two adjacent frames of first images in at least two frames of first images; the image acquisition time of the second image is within the time range of acquiring the two adjacent frames of the first images; the first image and the second image of the two adjacent frames both comprise target features; determining a third image based on the target feature included in each frame of the second image and the first images of the two adjacent frames; inserting the third image between the two adjacent frames of the first image of the first video data to generate second video data; wherein the type of the first image is the same as the type of the third image; the type of the first image is different from the type of the second image.

Description

Video processing method, device and storage medium

Technical Field

The present invention relates to image processing technologies, and in particular, to a video processing method, apparatus, and storage medium.

Background

The slow motion video recording needs images with a high frame rate to ensure the continuity of motion and motion tracks, a three primary color (RGB, Red Green Blue) camera is limited by the reading speed, and the resolution of output images needs to be reduced to 1080P or 720P or even lower in order to realize the high frame rate. However, the video frame rate of the hardware output is only a few hundred Frames Per Second (FPS) for transmission. In order to realize a higher frame rate, image interpolation can be performed between two frames based on the motion track of the subject in the two frames output by the hardware along the motion track, so that the higher frame rate is realized, and the video looks more coherent and smooth. The hardware-based high frame rate scheme needs to reduce the resolution of the image to obtain the output of a higher frame rate, which may result in loss of high frequency information and unclear image.

Disclosure of Invention

In view of the foregoing, it is a primary object of the present invention to provide a video processing method, apparatus and storage medium.

In order to achieve the purpose, the technical scheme of the invention is realized as follows:

the embodiment of the invention provides a video processing method, which comprises the following steps:

acquiring first video data; the first video data comprises at least two frames of first images;

acquiring at least one frame of second image corresponding to two adjacent frames of first images in at least two frames of first images; the image acquisition time of the second image is within the time range of acquiring the two adjacent frames of the first images; the first image and the second image of the two adjacent frames both comprise target features;

determining a third image based on the target feature included in each frame of the second image and the first images of the two adjacent frames;

inserting the third image between the two adjacent frames of the first image of the first video data to generate second video data;

wherein the type of the first image is the same as the type of the third image; the type of the first image is different from the type of the second image.

In the foregoing solution, the frame number of the second image is one frame, and determining a third image based on the target feature included in each frame of the second image and the two adjacent frames of the first image includes:

and fusing the target characteristics with at least one frame of the adjacent two frames of the first images according to the target characteristics included in each frame of the second image to obtain a third image.

In the above solution, the number of frames of the second image is at least two, and determining a third image based on the target feature included in each frame of the second image and the two adjacent frames of the first image includes:

determining a motion track of a target feature based on the target feature included in each frame of the second image in at least two frames of the second image;

and according to the motion track, fusing the target feature of each frame of the second image with at least one frame of the adjacent two frames of the first images to obtain at least two third images with the target feature action tracks.

In the above scheme, the target feature at least includes texture information; the first image includes at least color information;

fusing the target feature with at least one frame of the two adjacent frames of the first images to obtain a third image, wherein the process comprises the following steps:

and fusing the texture information of the target feature and the color information corresponding to the target feature in at least one frame of the first images of the two adjacent frames to obtain a third image.

In the above scheme, the first image and the third image are RGB images; the second image is an image generated based on an event stream acquired by a dynamic vision sensor DVS.

An embodiment of the present invention provides a video processing apparatus, where the apparatus includes:

the first acquisition module is used for acquiring first video data; the first video data comprises at least two frames of first images;

the second acquisition module is used for acquiring at least one frame of second image corresponding to two adjacent frames of first images in at least two frames of first images; the image acquisition time of the second image is within the time range of acquiring the two adjacent frames of the first images; the first image and the second image of the two adjacent frames both comprise target features;

the processing module is used for determining a third image based on a target feature included in each frame of the second image and the first images of the two adjacent frames;

the generating module is used for inserting the third image between the two adjacent frames of the first image of the first video data to generate second video data;

In the above scheme, the frame number of the second image is one frame, and the processing module is configured to fuse the target feature with at least one frame of the two adjacent frames of the first image according to a target feature included in each frame of the second image, so as to obtain a third image.

In the above scheme, the number of frames of the second image is at least two frames, and the processing module is configured to determine a motion trajectory of a target feature based on the target feature included in each frame of the second image in the at least two frames of the second image; and according to the motion track, fusing the target feature of each frame of the second image with at least one frame of the adjacent two frames of the first images to obtain at least two third images with the target feature action tracks.

In the above solution, the target feature at least includes: texture information; the first image includes at least: color information;

the processing module is specifically configured to fuse the texture information of the target feature and the color information corresponding to the target feature in at least one of the two adjacent frames of the first image to obtain a third image.

In the above scheme, the first image and the third image are RGB images; the second image is an image generated based on an event stream acquired by the DVS.

An embodiment of the present invention provides a video processing apparatus, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the video processing method when executing the program.

An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the video processing method described above.

The video processing method, the video processing device and the storage medium provided by the embodiment of the invention comprise the following steps: acquiring first video data; the first video data comprises at least two frames of first images; acquiring at least one frame of second image corresponding to two adjacent frames of first images in at least two frames of first images; the image acquisition time of the second image is within the time range of acquiring the two adjacent frames of the first images; the first image and the second image of the two adjacent frames both comprise target features; determining a third image based on the target feature included in each frame of the second image and the first images of the two adjacent frames; inserting the third image between the two adjacent frames of the first image of the first video data to generate second video data; wherein the type of the first image is the same as the type of the third image; the type of the first image is different from the type of the second image; in this way, the third image is generated based on the target feature, and the first video data is interpolated based on the third image, so as to obtain the video data (i.e., the second video data) with a higher frame rate.

Drawings

FIG. 1 is a flow chart of a software frame insertion method;

FIG. 2 is a schematic diagram illustrating the effect of a software frame insertion method;

fig. 3 is a schematic flowchart of a video processing method according to an embodiment of the present invention;

fig. 4 is a flowchart illustrating another video processing method according to an embodiment of the present invention;

fig. 5 is a schematic diagram of a DVS-assisted frame interpolation method according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of another video processing apparatus according to an embodiment of the present invention.

Detailed Description

Prior to further detailed description of the present invention with reference to the examples, the related art will be described.

As described above, the hardware-based high frame rate scheme requires a reduced image resolution to obtain a higher frame rate output, which results in loss of high frequency information and unclear image. To address this problem, a software frame insertion scheme is provided in the related art.

FIG. 1 is a flow chart of a software frame insertion method; as shown in fig. 1, in the software Frame interpolation scheme, an image Frame is inserted between a Frame N (nth Frame image) and a Frame N +1(N +1 th Frame image) of a low Frame rate video photographed by an RGB camera to output a high Frame rate video.

And is uniformly interpolated for the process of inserting image frames. FIG. 2 is a schematic diagram illustrating the effect of a software frame insertion method; as shown in fig. 2, the first triangle and the second triangle represent a feature point of Frame N and Frame N +1, which moves from the position of the first triangle to the position of the second triangle; although the characteristic points have a changing track, the software Frame inserting method is to uniformly insert the intermediate process frames (namely, inserting Frame1 and inserting Frame2) along the motion track, and the motion track is determined based on Frame N and Frame N +1, and the motion track corresponding to the characteristic points is a straight track, so that the reality is low.

In summary, the above scheme relies on the motion trajectories of the feature points of the two frames of images when performing frame interpolation, but the two frames of images often cannot accurately reflect the motion trajectories of the feature subjects, which may cause frame interpolation trajectory errors, and the final effect and the real situation are different.

Based on this, in the embodiment of the present invention, first video data is acquired; the first video data comprises at least two frames of first images; acquiring at least one frame of second image corresponding to two adjacent frames of first images in at least two frames of first images; the image acquisition time of the second image is within the time range of acquiring the two adjacent frames of the first images; the first image and the second image of the two adjacent frames both comprise target features; determining a third image based on the target feature included in each frame of the second image and the first images of the two adjacent frames; inserting the third image between the two adjacent frames of the first image of the first video data to generate second video data; wherein the type of the first image is the same as the type of the third image; the type of the first image is different from the type of the second image.

The present invention will be described in further detail with reference to examples.

Fig. 3 is a schematic flowchart of a video processing method according to an embodiment of the present invention; as shown in fig. 3, the video processing method is applied to an electronic device having a photographing function, which may be a smart phone, a notebook computer, a Personal Digital Assistant (PDA), a tablet computer (PAD), a Portable Multimedia Player (PMP), or the like; the method comprises the following steps:

301, acquiring first video data; the first video data comprises at least two frames of first images;

step 302, acquiring at least one frame of second image corresponding to two adjacent frames of first images in at least two frames of first images; the image acquisition time of the second image is within the time range of acquiring the two adjacent frames of the first images; the first image and the second image of the two adjacent frames both comprise target features;

step 303, determining a third image based on the target feature included in each frame of the second image and the first images of the two adjacent frames;

step 304, inserting the third image into the first video data to generate second video data;

In an embodiment, the acquiring at least one frame of second image corresponding to two adjacent frames of first images in at least two frames of the first images includes:

and acquiring at least one frame of second image corresponding to two adjacent frames of first images in the at least two frames of first images according to the time information corresponding to each frame of first image in the at least two frames of first images.

Specifically, when the first video data is shot, time information (marked as a first time point) corresponds to each frame of first image in the first video data;

when a video consisting of second images is shot, each frame of second image corresponds to time information (marked as a second time point);

based on the first time point and the second time point, at least one frame of second image corresponding to two adjacent frames of first image can be determined. Specifically, the second image acquired in a time range (specifically, a time range between first time points of the two adjacent frames of the first images) of the two adjacent frames of the first images is acquired (i.e., captured).

Here, the image capturing time of at least one frame of second image is specified in the time range of capturing the two adjacent frames of first images; supposing that the time for acquiring the first images of the two adjacent frames is T1 and T2 respectively; the image capturing time of the at least one second image frame is any time point between T1 and T2. Alternatively, for the ultra-low frame rate video data, the image capturing time of the at least one frame of second image may also be T1, T1, and T2 at any time point, T2.

Here, the first image and the third image are RGB images; the second image is an image generated based on an event stream acquired by a Dynamic Vision Sensor (DVS).

The DVS is a Complementary Metal Oxide Semiconductor (CMOS) asynchronous image sensor based on Address-Event Representation (AER), and compared with a conventional image sensor that outputs pixel information externally in the form of frames, the DVS imitates the working mechanism of biological retinal nerves, adopts asynchronous Address-Event Representation, and outputs only pixel addresses and information with light intensity change instead of reading out each pixel information in a frame in sequence. The method has the characteristics of real-time dynamic response of scene change, ultra-sparse representation of images, asynchronous output of time domain change events and the like, can overcome the defects of high redundancy, high delay, high noise, low dynamic range, transmission bottleneck and the like caused by a traditional image sensor frame sampling mode, and is suitable for the visual field with high speed and high real-time application.

Because the DVS has the characteristics of low delay and low redundancy, the number of image frames which can be output in the same time is often many times that of the common RGB camera. That is, the video shot based on DVS includes a larger number of images than the video shot based on the general camera in the same period of time. I.e., the frame rate of the DVS captured image is at least higher than the frame rate of the RGB image. Therefore, it is possible to interpolate (insert images) the video shot based on the normal camera based on the video shot by the DVS, thereby improving the definition of the video shot based on the normal camera (such as the first video data described above).

In one embodiment, the number of frames of the second image is one frame, and the determining a third image based on the target feature included in each frame of the second image and the two adjacent frames of the first image includes:

In one embodiment, the target feature of the second image comprises at least texture information; the first image includes at least color information; the fusing the target feature with at least one frame of the two adjacent frames of the first image to obtain a third image, including:

Here, the two adjacent frames of the first image and the second image both include the target feature, so that the target feature of the second image and the color information corresponding to the target feature in at least one frame of the two adjacent frames of the first image can be fused to obtain a third image; the method specifically comprises the following three conditions:

fusing the target feature of the second image with the color information corresponding to the target feature in the first image in the two adjacent frames of the first images;

fusing the target feature of the second image with color information corresponding to the target feature in the second first image in the two adjacent frames of first images;

and fusing the target characteristics of the second image with the color information corresponding to the target characteristics in the first images of two adjacent frames.

That is, the fusion may be performed based on only one of the two adjacent frames of the first images, or the fusion may be performed based on the two frames of the first images to obtain the third image.

In one embodiment, the target feature of the second image comprises at least texture information; the first image includes at least color information;

the frame number of the second image is at least two frames, and the determining of the third image based on the target feature included in each frame of the second image and the first images of the two adjacent frames comprises:

Considering that the number of frames of the second image is at least two frames, for each frame of the second image, fusing the target feature of each frame of the second image with at least one frame of the adjacent two frames of the first image, including:

and fusing the texture information of the target feature of each frame of the second image with the corresponding color information of the target feature of each frame of the second image in at least one frame of the two adjacent frames of the first images to obtain a third image.

Here, the method for fusing the target feature of each frame of the second image with at least one frame of the two adjacent frames of the first images comprises the following steps:

That is, for the target feature of each frame of the second image, the fusion may be performed based on only any one of the two adjacent frames of the first images, or the fusion may be performed based on the two frames of the first images to obtain the third image.

In this way, the real action track between the two adjacent frames of the first image can be reflected by the at least two third images.

Specifically, two adjacent frames of first images may correspond to a plurality of second images, and the plurality of second images reflect the action tracks of the target features; for example, the first image 1 and the first image 2 have a moving object (e.g., a person, who walks); and the first image 1 and the second image 2 correspond to the second image 1+ n; target feature jumps (i.e., moving object jumps) are reflected in the second images 1 to 1+ n; while the first image 1 and the first image 2 do not reflect the feature; therefore, at least two third images may be determined from the second image 1 to the second image 1+ n to reflect the action of the person jumping.

Specifically, the texture information has no color, but may characterize the outline or shape of the feature, which is embodied by pixel points (black and white points in visual perception). Therefore, the texture information and the color information corresponding to the target feature determined from the first image are fused, and an image with the target feature having color, that is, a third image can be obtained.

In one example, the fusing the texture information of the target feature and the corresponding color information of the target feature in the second image to obtain a third image includes:

performing image recognition on the first image based on the texture information of the target feature, and determining a corresponding feature region in the first image;

and obtaining color information corresponding to the target feature based on the color of the feature in the feature region.

The image recognition may be any method having a shape or contour recognition function, and is not limited herein.

In practical application, at least one frame of image in the two adjacent frames of first images may be subjected to image recognition to obtain color information corresponding to the target feature. And performing image recognition on a first image with the target feature in any frame of the first video data to obtain color information corresponding to the target feature. For example, if the feature region cannot be obtained after the first images of the two adjacent frames are identified, an attempt may be made to determine a corresponding feature region from other first images in the first video data; of course, the above description is only for specifically identifying which first image, and is not limited.

In another example, other color information corresponding to the target feature may be determined, for example, the first image and the second image corresponding to the determination time information are determined, specifically, the two frames of second images closest to or consistent with the image acquisition time of the two adjacent frames of first images (the time difference is lower than a certain threshold, such as 1us) are determined, so that the determined first image and the determined second image may be compared to obtain a feature region corresponding to the target feature, and the color information of the target feature is determined based on the feature region. Furthermore, color fusion can be performed on the target features in other second images according to the color information to obtain a third image.

For example, the two adjacent frames of the first image are a first image a and a first image B; determining a second image A corresponding to the acquisition time of the first image A, and determining a second image B corresponding to the acquisition time of the first image B; and comparing the first image A with the second image A, or comparing the first image B with the second image B to obtain a characteristic region corresponding to the target characteristic.

As for the method for determining the color information corresponding to the target feature, any of the above methods may be adopted, or other methods may also be adopted, which is not limited herein.

In an embodiment, the inserting the third image between the two adjacent frames of the first image of the first video data comprises:

when the number of the third images is one, the third images can be directly inserted into corresponding positions of the first video data (between two adjacent frames of the first images);

when the number of the third images is multiple, taking the time point of the second image corresponding to each frame of the third image as the time point of the corresponding third image; and sequentially inserting a plurality of third images into corresponding positions (between two adjacent frames of the first images) of the first video data according to the time point of each frame of the third images.

Specifically, when the number of the third images is multiple, the number of the third images inserted between two adjacent frames of the first images may be equal to or less than the determined number of the second images corresponding to two adjacent frames of the first images.

The number of the third images to be inserted may be determined according to a frame number threshold previously set by a developer, and the frame number threshold may be two frames, three frames, or the like.

When the number of the generated third images is larger than the frame number threshold value, an inserted rule can be preset; for example, a threshold number of third images of the number of frames may be arbitrarily selected from the generated third images for insertion; alternatively, a threshold number of third images of the number of frames is selected from the generated third images for insertion by way of frame interval extraction.

In one embodiment, the first image and the third image are RGB images; the second image is an image generated based on an event stream acquired by a Dynamic Visual Sensor (DVS);

the frame rate of the first image (i.e., RGB image) is lower than the frame rate of the second image (i.e., image generated based on the event stream of DVS acquisition).

The acquiring of the first video data includes: acquiring the first video data by using a first camera;

the method further comprises the following steps: collecting at least one frame of second image by using a second camera; the second camera is associated with a DVS.

The first camera may be a camera that commonly captures RGB images; the second camera is associated with, i.e., captures based on, DVS.

The frame rate of the first image may also be equal to the frame rate of the second image, and the time for acquiring the first image and the time for acquiring the second image are staggered, so that the second image can be acquired within the time range for acquiring two adjacent frames of the first image.

For example, the time for acquiring the first images of two adjacent frames is respectively: t1, T2; assume that a second image was acquired at time point T1 'and that time point T1' is between time point T1 and time point T2; the acquired second image is the second image corresponding to the first image of the two adjacent frames.

According to the method provided by the embodiment of the invention, the object motion condition with higher frame rate is recorded through DVS, and then is fused with the RGB video (namely the first video data) to obtain the higher RGB video.

Compared with the software frame interpolation method shown in fig. 1, the recorded motion situation of the object is more real, and the finally obtained high-frame-rate video effect is better.

It should be noted that, in order to implement the method shown in fig. 1, the electronic device applied in the embodiment of the present invention has or is connected to the first camera and the second camera.

Fig. 4 is a flowchart illustrating another video processing method according to an embodiment of the present invention; as shown in fig. 4, the video processing method is applied to an electronic device, and the method includes:

acquiring RGB video data (equivalent to the first video data in the method shown in fig. 3) and DVS video data; the RGB video data is obtained based on the shooting of an RGB camera (camera), and is low frame rate video data needing to improve the frame rate; the DVS video data is captured based on DVS camera.

Determining two adjacent frames of RGB images in the RGB video data and at least one frame of DVS image corresponding to the two adjacent frames of RGB images in the RGB video data; wherein, the two adjacent frames of RGB images refer to farm N and farm N +1(N is greater than or equal to 1); the at least one Frame of DVS image refers to Frame M + M (M and M are more than or equal to 1). Here, the farm N and the Frame M may correspond to the same time point, or the time points of the farm N and the Frame M may differ by less than a preset threshold (e.g., 1 us); the farm N +1 and the Frame M + M may correspond to the same time point, or the time points of the farm N +1 and the Frame M + M may differ by less than a preset threshold (e.g., 1 us). Determining at least one frame of DVS image corresponding to two adjacent frames of RGB images in RGB video data, namely acquiring at least one frame of DVS image corresponding to the two adjacent frames of RGB images from the DVS video data; specifically, the determination may be based on the corresponding time information of each frame in the video data, that is, determining at least one RGB image and at least one DVS image with the same time zone.

And determining a third image based on the target feature included in each frame of the DVS image and the RGB images of the two adjacent frames. Specifically, the target feature may be determined from the Frame M to the Frame M + M; determining a motion track of the target feature based on the target feature; and according to the motion track, fusing the target features of each Frame of DVS image (namely Frame M to Frame M + M) with the RGB image to obtain at least two third images with the target feature action track. Here, the DVS image corresponds to a second image; the RGB image corresponds to the first image; the specific process of fusion has already been described in the method shown in fig. 3, and is not described here again.

And finally, inserting the third image into the RGB video data to obtain the RGB high frame rate video. Here, the RGB high frame rate video refers to high frame rate video data equivalent to the RGB video data. The number of the third images inserted into the RGB video data may be one or more.

According to the method, the characteristic that the DVS obtains the high Frame rate is utilized, and data (such as Frame M to Frame M + M) of the DVS between the image frames shot by the RGB camera are extracted; local features (namely the target features) which change between frames are obtained based on data of the DVS, the local features provided by the DVS are used as guidance to predict the motion trail of the RGB camera motion area, and frames are inserted between frames of the RGB camera by the motion trail to obtain a video with a higher frame rate.

Regarding determining a third image based on the target feature included in each frame of the DVS image and the two adjacent frames of the RGB images, as shown in fig. 5, fig. 5 is a schematic diagram of a DVS auxiliary frame interpolation mode provided by an embodiment of the present invention; in the embodiment of the present invention, the DVS may record the real motion trajectory of the feature point (i.e., the target feature) between two frames, and then fuse the real motion trajectory with the RGB image frames to obtain intermediate frames (e.g., frames 1 to Frame m, which are equivalent to the one or more third images) that better conform to the real situation.

Fig. 6 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present invention; as shown in fig. 6, the video processing apparatus includes:

In an embodiment, the frame number of the second image is one frame, and the processing module is configured to fuse the target feature with at least one frame of the two adjacent frames of the first image according to a target feature included in each frame of the second image, so as to obtain a third image.

In an embodiment, the number of frames of the second image is at least two frames, and the processing module is configured to determine a motion trajectory of a target feature based on the target feature included in each frame of the second image in the at least two frames of the second image;

In one embodiment, the target feature includes at least texture information; the first image includes at least color information;

and the processing module is used for fusing the texture information of the target feature and the color information corresponding to the target feature in at least one frame of the first images of the two adjacent frames to obtain a third image.

In one embodiment, the first image and the third image are RGB images; the second image is an image generated based on an event stream acquired by the DVS.

In an embodiment, the first obtaining module is further configured to acquire the first video data by using a first camera;

the second acquisition module is further used for acquiring at least one frame of second image by using a second camera; the second camera is associated with a DVS.

It should be noted that: in the video processing apparatus provided in the foregoing embodiment, when implementing the corresponding video processing method, only the division of the program modules is taken as an example, and in practical applications, the processing distribution may be completed by different program modules according to needs, that is, the internal structure of the network device is divided into different program modules to complete all or part of the processing described above. In addition, the apparatus provided by the above embodiment and the embodiment of the corresponding method belong to the same concept, and the specific implementation process thereof is described in the method embodiment, which is not described herein again.

Fig. 7 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present invention; as shown in fig. 7, the apparatus 70 includes: a processor 701 and a memory 702 for storing a computer program operable on the processor; wherein, when the processor 701 is configured to run the computer program, it executes: acquiring first video data; the first video data comprises at least two frames of first images; acquiring at least one frame of second image corresponding to two adjacent frames of first images in at least two frames of first images; the image acquisition time of the second image is within the time range of acquiring the two adjacent frames of the first images; the first image and the second image of the two adjacent frames both comprise target features; determining a third image based on the target feature included in each frame of the second image and the first images of the two adjacent frames; inserting the third image between the two adjacent frames of the first image of the first video data to generate second video data; wherein the type of the first image is the same as the type of the third image; the type of the first image is different from the type of the second image.

In an embodiment, the frame number of the second image is one frame, and the processor 701 is further configured to execute, when the computer program runs, the following steps: and fusing the target characteristics with at least one frame of the adjacent two frames of the first images according to the target characteristics included in each frame of the second image to obtain a third image.

In an embodiment, the number of frames of the second image is at least two, and the processor 701 is further configured to execute, when the computer program runs, the following steps: determining a motion track of a target feature based on the target feature included in each frame of the second image in at least two frames of the second image; and according to the motion track, fusing the target feature of each frame of the second image with at least one frame of the adjacent two frames of the first images to obtain at least two third images with the target feature action tracks.

In an embodiment, the processor 701 is further configured to, when running the computer program, perform: and fusing the texture information of the target feature and the color information corresponding to the target feature in at least one frame of the first images of the two adjacent frames to obtain a third image.

In an embodiment, the processor 701 is further configured to, when running the computer program, perform: acquiring the first video data by using a first camera; acquiring at least one frame of second image by using a second camera; the second camera is associated with a DVS.

When the processor runs the computer program, the corresponding process implemented by the electronic device in each method according to the embodiment of the present invention is implemented, and for brevity, no further description is given here.

In practical applications, the apparatus 70 may further include: at least one network interface 703. The various components in the video processing device 70 are coupled together by a bus system 704. It is understood that the bus system 704 is used to enable communications among the components. The bus system 704 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled in fig. 7 as the bus system 704. The number of the processors 701 may be at least one. The network interface 703 is used for communication between the video processing apparatus 70 and other devices in a wired or wireless manner.

The memory 702 in embodiments of the present invention is used to store various types of data to support the operation of the video processing device 70.

The method disclosed in the above embodiments of the present invention may be applied to the processor 701, or implemented by the processor 701. The processor 701 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be implemented by integrated logic circuits of hardware or instructions in the form of software in the processor 701. The Processor 701 may be a general purpose Processor, a DiGital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The processor 701 may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present invention. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed by the embodiment of the invention can be directly implemented by a hardware decoding processor, or can be implemented by combining hardware and software modules in the decoding processor. The software modules may be located in a storage medium located in the memory 702, and the processor 701 may read the information in the memory 702 and perform the steps of the aforementioned methods in conjunction with its hardware.

In an exemplary embodiment, the video processing Device 70 may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, Micro Controllers (MCUs), microprocessors (microprocessors), or other electronic components for performing the foregoing methods.

An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored; the computer program, when executed by a processor, performs: acquiring first video data; the first video data comprises at least two frames of first images; acquiring at least one frame of second image corresponding to two adjacent frames of first images in at least two frames of first images; the image acquisition time of the second image is within the time range of acquiring the two adjacent frames of the first images; the first image and the second image of the two adjacent frames both comprise target features; determining a third image based on the target feature included in each frame of the second image and the first images of the two adjacent frames; inserting the third image between the two adjacent frames of the first image of the first video data to generate second video data; wherein the type of the first image is the same as the type of the third image; the type of the first image is different from the type of the second image.

In one embodiment, the frame number of the second image is one frame, and the computer program, when executed by the processor, performs: and fusing the target characteristics with at least one frame of the adjacent two frames of the first images according to the target characteristics included in each frame of the second image to obtain a third image.

In an embodiment, the number of frames of the second image is at least two, and the computer program, when executed by the processor, performs: determining a motion track of a target feature based on the target feature included in each frame of the second image in at least two frames of the second image; and according to the motion track, fusing the target feature of each frame of the second image with at least one frame of the adjacent two frames of the first images to obtain at least two third images with the target feature action tracks.

In one embodiment, the computer program, when executed by the processor, performs: and fusing the texture information of the target feature and the color information corresponding to the target feature in at least one frame of the first images of the two adjacent frames to obtain a third image.

In one embodiment, the computer program, when executed by the processor, performs: acquiring the first video data by using a first camera; acquiring at least one frame of second image by using a second camera; the second camera is associated with a DVS.

When the computer program is executed by a processor, the corresponding processes implemented by the electronic device in the methods according to the embodiments of the present invention are implemented, and for brevity, are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.

It should be noted that: in the present examples, "first", "second", etc. are used for distinguishing similar objects and are not necessarily used for describing a particular order or sequence.

In addition, the technical solutions described in the embodiments of the present invention may be arbitrarily combined without conflict.

In the present examples, a plurality means at least two, e.g., two, three, etc., unless specifically limited otherwise.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A method of video processing, the method comprising:

2. The method of claim 1, wherein the number of frames of the second image is one frame, and wherein determining a third image based on the target feature included in the second image and the two adjacent frames of the first image comprises:

3. The method of claim 1, wherein the number of frames of the second image is at least two, and wherein determining a third image based on the target feature included in the second image and the two adjacent frames of the first image comprises:

4. A method according to claim 2 or 3, wherein the target feature comprises at least texture information; the first image includes at least color information;

5. The method according to any one of claims 1 to 3, wherein the first image and the third image are RGB images; the second image is an image generated based on an event stream acquired by a dynamic vision sensor DVS.

6. A video processing apparatus, characterized in that the apparatus comprises:

7. The apparatus according to claim 6, wherein the frame number of the second image is one frame, and the processing module is configured to fuse the target feature with at least one frame of the two adjacent frames of the first image according to a target feature included in each frame of the second image, so as to obtain a third image.

8. The apparatus according to claim 6, wherein the number of frames of the second image is at least two frames, and the processing module is configured to determine a motion trajectory of the target feature based on a target feature included in each of the at least two frames of the second image; and according to the motion track, fusing the target feature of each frame of the second image with at least one frame of the adjacent two frames of the first images to obtain at least two third images with the target feature action tracks.

9. A video processing apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method of any one of claims 1 to 5 are implemented when the program is executed by the processor.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 5.