CN111586321A - Video generation method and device, electronic equipment and computer-readable storage medium - Google Patents

Video generation method and device, electronic equipment and computer-readable storage medium Download PDF

Info

Publication number
CN111586321A
CN111586321A CN202010381604.0A CN202010381604A CN111586321A CN 111586321 A CN111586321 A CN 111586321A CN 202010381604 A CN202010381604 A CN 202010381604A CN 111586321 A CN111586321 A CN 111586321A
Authority
CN
China
Prior art keywords
image
target
original
interpolation
images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010381604.0A
Other languages
Chinese (zh)
Other versions
CN111586321B (en
Inventor
张弓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN202010381604.0A priority Critical patent/CN111586321B/en
Publication of CN111586321A publication Critical patent/CN111586321A/en
Application granted granted Critical
Publication of CN111586321B publication Critical patent/CN111586321B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/272Means for inserting a foreground image in a background image, i.e. inlay, outlay
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/265Mixing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/01Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
    • H04N7/0135Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level involving interpolation processes

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Studio Circuits (AREA)
  • Processing Or Creating Images (AREA)
  • Television Systems (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

The application relates to a video generation method, which comprises the following steps: acquiring an original image set, wherein the original image set comprises at least two original images; acquiring a target foreground change state corresponding to the original image set; determining an interpolation time phase corresponding to the original image set according to the target foreground change state; and performing frame interpolation processing on the original image in the original image set according to the interpolation time phase to obtain a target video. The application also discloses a video generation device, an electronic device and a computer readable storage medium. The video with high authenticity can be generated.

Description

Video generation method and device, electronic equipment and computer-readable storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a video generation method, an apparatus, an electronic device, and a computer-readable storage medium.
Background
In a conventional method for generating a video from an image, an input image is edited to a certain extent, and then the image is copied and played repeatedly to obtain a video. When switching between images occurs, the conventional techniques usually employ special effects for a certain transition. However, the conventional method has a problem that the reality of the video is low.
Disclosure of Invention
The embodiment of the application provides a video generation method and device, electronic equipment and a computer readable storage medium, which can improve the authenticity of a generated video.
A video generation method, comprising:
acquiring an original image set, wherein the original image set comprises at least two original images;
acquiring a target foreground change state corresponding to the original image set;
determining an interpolation time phase corresponding to the original image set according to the target foreground change state;
and performing frame interpolation processing on the original image in the original image set according to the interpolation time phase to obtain a target video.
A video generation apparatus comprising:
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring an original image set, and the original image set comprises at least two original images;
the second acquisition module is used for acquiring a target foreground change state corresponding to the original image set;
the determining module is used for determining an interpolation time phase corresponding to the original image set according to the target foreground change state;
and the frame interpolation module is used for carrying out frame interpolation processing on the original images in the original image set according to the interpolation time phase to obtain a target video.
An electronic device comprising a memory and a processor, the memory having stored therein a computer program that, when executed by the processor, causes the processor to perform the steps of:
acquiring an original image set, wherein the original image set comprises at least two original images;
acquiring a target foreground change state corresponding to the original image set;
determining an interpolation time phase corresponding to the original image set according to the target foreground change state;
and performing frame interpolation processing on the original image in the original image set according to the interpolation time phase to obtain a target video.
A computer-readable storage medium having stored thereon a computer program, the computer program being executed by a processor for:
acquiring an original image set, wherein the original image set comprises at least two original images;
acquiring a target foreground change state corresponding to the original image set;
determining an interpolation time phase corresponding to the original image set according to the target foreground change state;
and performing frame interpolation processing on the original image in the original image set according to the interpolation time phase to obtain a target video.
According to the video generation method, the video generation device, the electronic equipment and the computer readable storage medium, the original image set is obtained, the target foreground change state corresponding to the original image set is obtained, the interpolation time phase corresponding to the original image set is determined according to the target foreground change state, at least two original images in the original image set are subjected to frame interpolation according to the interpolation time phase, the target video is obtained, the foreground in the original images can generate different state change effects according to the target foreground change state, the operability of video generation according to images is improved, and the reality of the generated video is also improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow diagram of a video generation method in one embodiment;
FIG. 2 is a diagram illustrating generation of a target video with uniformly varying motion in one embodiment;
FIG. 3 is a diagram illustrating generation of a target video with uniformly varying deformations, according to an embodiment;
FIG. 4 is a diagram illustrating generation of a non-uniformly varying target video, in one embodiment;
FIG. 5 is a diagram illustrating generation of a target video from a target group of images in one embodiment;
FIG. 6 is a diagram illustrating generation of a target video from a target group of images in another embodiment;
FIG. 7 is a diagram illustrating generation of a target video from a set of looped images in one embodiment;
FIG. 8 is a diagram illustrating generation of a target video, under an embodiment;
FIG. 9 is a flowchart illustrating a process of interpolating a frame to obtain a target video according to an embodiment;
FIG. 10 is a diagram of forward and backward motion vectors in one embodiment;
FIG. 10A is a diagram illustrating a modified forward motion vector and a modified backward motion vector, in accordance with an embodiment;
FIG. 11 is a schematic diagram of forward and backward mapped motion vectors;
FIG. 12 is a schematic diagram of a video generation system in one embodiment;
FIG. 13 is a block diagram showing the structure of a video generating apparatus according to an embodiment;
fig. 14 is a schematic diagram of an internal structure of an electronic device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
It will be understood that the terms "first," "second," and the like as used herein may be used herein to describe various data, but the data is not limited by these terms. These terms are only used to distinguish one datum from another. For example, the first fused image may be referred to as a second fused image, and similarly, the second fused image may be referred to as a first fused image, without departing from the scope of the present application. Both the first fused image and the second fused image are fused images, but they are not the same fused image.
FIG. 1 is a flow diagram of a method for video generation in one embodiment. The video generation method in this embodiment is described by taking an example of running on an electronic device or a server. The electronic device may be a terminal device including a mobile phone, a tablet computer, a Personal Digital Assistant (PDA), a vehicle-mounted computer, a wearable device, and the like. Where the server 120 may be a server or a cluster of servers. As shown in fig. 1, the video generation method includes steps 102 to 108.
Step 102, an original image set is obtained, wherein the original image set comprises at least two original images.
Specifically, the original image set includes at least two original images. The original image may be an image shot by the terminal in real time, or may be a pre-configured image, such as an image downloaded in a network, an image in an electronic album, or the like. The resolution of the original images in the set of original images may be the same or different. The backgrounds of the respective original images in the set of original images match. The background matching may be that the similarity of the background is higher than a preset similarity. The subjects in the at least two original images may or may not be the same.
And 104, acquiring a target foreground change state corresponding to the original image set.
The target foreground change state comprises a uniform change state and a non-uniform change state. Wherein, the uniform change state comprises at least one of a deformation uniform change state, a movement uniform change state and a pose uniform change state. The non-uniform state may also include at least one of a deformation non-uniform state, a motion non-uniform state, and a pose non-uniform state. The foreground is a region where a subject in the image is located, and the subject is generally a region where a change in shape, posture, color, or position occurs, but is not limited thereto. The background is an area where the background is located in the image, where the background may be an area where the motion amplitude is smaller than a preset threshold or an area corresponding to the secondary object, but is not limited thereto.
Or, the target foreground change state may also include a motion change state, a deformation change state, and a pose change state. The motion change state includes a uniform motion change state and a non-uniform motion change state. The deformation change state includes a uniform deformation change state and a non-uniform deformation change state. The pose change state includes a uniform pose change state and a non-uniform pose change state.
Alternatively, for example, the original image set includes a first original image and a second original image, and the target foreground changing state is that the foreground moves uniformly from a position in the first original image to a position in the second original image, and the like are not limited thereto.
Specifically, the electronic device may obtain a selected target foreground change state corresponding to the original image set. Or, the electronic device may acquire a default target foreground change state corresponding to the original image set.
In this embodiment, the electronic device may identify the foreground in the at least two original images, and determine the target foreground change state according to the foreground. Specifically, the electronic device may identify the images in the original image set by a foreground-background discrimination algorithm to obtain a foreground and a background, where the foreground and the background may be adaptively identified or by receiving an operation setting applied to the images. Where the foreground and background may be regularly shaped regions or outline regions that directly match the foreground and background. In one embodiment, the electronic device may identify the foreground and the background of the images in the original image set through a foreground-background discrimination model obtained through neural network training. For example, the original image includes a first original image and a second original image, the center coordinate of the foreground in the first original image is (10, 10), and the center coordinate of the second original image is (10, 100), so that the target foreground change state corresponding to the original image set can be determined to be the motion change state according to the foreground coordinate. For another example, the original image includes a first original image and a second original image, where an area corresponding to a foreground in the first original image is 100, and an area corresponding to a foreground in the second original image is 1000, and then it may be determined that a target foreground change state corresponding to the original image set is a deformation change state according to the area of the foreground.
And 106, determining an interpolation time phase corresponding to the original image set according to the target foreground change state.
The interpolation time phase is to divide the variation amplitude of the foregrounds of the two images into N parts, wherein each part represents a phase, and N can be customized. The electronic device may define the number of N according to the variation amplitude of the foreground of the image, and if the variation amplitude of the foreground of the two images is in direct proportion to the value of N, and the variation amplitude is large, a plurality of interpolation time phases are required to change from the first foreground to the second foreground. If the variation amplitude of the foreground is small, the first foreground can be quickly changed into the second foreground only by less interpolation time phases. And determining an interpolation time phase according to the target foreground change state, so that the attention to the image foreground is reflected, and the interpolation image is more in line with the attention of human eyes.
If the first image corresponds to the first foreground, the second image corresponds to the second foreground, and N is 3, the interpolation time phase may be 1/3, 2/3, which indicates that the first foreground of the first image becomes the second foreground after two time phase changes.
Specifically, when the target foreground change state is a uniform change state, the interpolation time phase corresponding to the original image set is a uniform interpolation time phase. And when the target foreground change state is a non-uniform change state, the interpolation time phase corresponding to the original image set is a non-uniform interpolation time phase.
And 108, performing frame interpolation processing on at least two original images in the original image set according to the interpolation time phase to obtain a target video.
Specifically, when 2 images are included in the original image set, a plurality of different interpolation time phases may be determined for the 2 images, and frame interpolation is performed a plurality of times. When the original image set includes at least three images, a plurality of corresponding interpolation time phases need to be determined according to the target image group. Each newly generated interpolated image corresponds to an interpolated temporal phase.
In this embodiment, the Frame interpolation method includes MEMC (Motion estimation and Motion Compensation), optical flow method, neural network or any other FRC (Frame Rate Conversion) technology. In this embodiment, when performing frame interpolation on at least two original images for multiple times according to the interpolation time phase, a corresponding frame interpolation method is selected for each frame interpolation, and multiple interpolated images may be generated by different frame interpolation methods. In one embodiment, the complexity of the foreground of the frame image to be interpolated is calculated, the face area in the foreground is detected to obtain a detection result, and a corresponding target frame interpolation algorithm is selected according to the complexity of the foreground and the detection result. In one embodiment, the method is used for image editing during the production process of the electronic photo album, and the target video is formed by using the existing photos.
The video generation method in the embodiment obtains the original image set, obtains the target foreground change state corresponding to the original image set, determines the interpolation time phase corresponding to the original image set according to the target foreground change state, and performs frame interpolation processing on at least two original images in the original image set according to the interpolation time phase to obtain the target video.
In one embodiment, determining an interpolation time phase corresponding to an original image set according to a target foreground change state includes: and when the target foreground change state is a uniform change state, determining a uniform interpolation time phase corresponding to each target image group, wherein the uniform change state comprises at least one of a deformation uniform change state, a motion uniform change state and a pose uniform change state.
The uniform change state refers to a uniform change state. The deformation uniform change state means that in the target video, the deformation of which the effect to be achieved is the foreground is uniformly changed. For example, a small ball is uniformly changed from a large size to a small size, etc., but is not limited thereto. The motion uniformity state means that the effect to be achieved in the target video is the effect of uniform motion. Such as a small ball moving at a constant velocity from one side of the image to the other side of the image, etc., are not limited thereto. The pose uniform change state means that the effect required to be achieved in the target video is pose uniform change. Such as a transition of a person's action from one action to another, etc., are not so limited. The uniform interpolation temporal phase means that the variation amplitude of the foreground of the two images is equally divided into N parts, wherein each part represents a phase, and N can be customized. For example, uniform interpolation time phases of 1, 2, 3.. 10, etc. are not limited thereto.
Specifically, when the target foreground change state is a uniform change state, the electronic device obtains a uniform interpolation time phase corresponding to the original image set. The uniform interpolation time phase may be preset, or may be determined according to the foreground variation amplitude in the original image set. For example, to achieve a uniform change state, it is necessary to divide the original images into N time phases according to the foreground change degree of the foreground in the original images, interpolate a frame for each time phase, and add at least two original images as the first frame and the end frame to output N +2 frames. And finishing playing the N +2 frames of images within T seconds. For example, if the 300 frames are played within 10 seconds, the playing interval per frame is 33 ms.
In this embodiment, as shown in fig. 2, a schematic diagram of generating a target video with uniformly changing motion in one embodiment is shown. And performing frame interpolation processing on the two original images with unchanged backgrounds and changed foreground motion according to the uniform interpolation time phase to obtain a target video. The video resolution size may be consistent with the original image. And horizontally moving the foreground in the target video at a constant speed, and uniformly moving the foreground from the position of the original image A to the position of the original image B.
In this embodiment, as shown in fig. 3, a schematic diagram of generating a target video with uniformly changing deformation according to an embodiment is shown. And performing frame interpolation processing on the two original images with unchanged backgrounds and changed foreground motion according to the uniform interpolation time phase to obtain a target video. And the foreground in the target video is increased at a constant speed and is changed from the size corresponding to the original image A to the size corresponding to the original image B.
In the video generation method in this embodiment, when the target foreground change state is the uniform change state, the uniform interpolation time phase corresponding to the original image set is obtained, and frame interpolation processing is performed according to the uniform interpolation time phase, so that the foreground in the target video can achieve the effect of uniform change, and the authenticity of the generated target video is improved.
In one embodiment, determining an interpolation time phase corresponding to an original image set according to a target foreground change state includes: and when the target foreground change state is the first non-uniform change state, determining a non-uniform interpolation time phase corresponding to the original image set.
Wherein the first nonuniform change state is a non-uniform change state. Likewise, the non-uniform variation state includes at least one of a deformation non-uniform variation state, a motion non-uniform variation state, and a pose non-uniform variation state. The deformation non-uniform change state means that the deformation of the foreground to be achieved in the target video is non-uniformly changed. Such as a non-uniform change of one bead from size to size, etc., are not limited thereto. The motion non-uniform change state means that the effect required to be achieved in the target video is the effect of uniform motion. Such as a small ball moving at a constant velocity from one side of the image to the other side of the image, etc., are not limited thereto. The pose non-uniform change state means that the effect required to be achieved in the target video is the pose non-uniform change. Such as a transition of a person's action from one action to another, etc., are not so limited. The non-uniform interpolation time phase means that the variation amplitude of the foregrounds of the two images is averagely divided into N parts, wherein each part represents one phase, and N can be customized. For example, non-uniform interpolation time phases of 1, 5, 7, 8, etc. are not limited thereto. And the first non-uniform change state is implemented based on a non-uniform interpolation time phase.
Specifically, when the target foreground change state is a first non-uniform change state, a non-uniform interpolation time phase corresponding to the original image set is determined. The non-uniform interpolation time phase may be default in the electronic device, may be selected, or may be determined according to foreground variation in the original image, etc., but is not limited thereto. For example, in two original images, 298 time phases are equally divided according to the foreground variation degree of the foreground in the original images, one frame is interpolated in uneven phases, and the input images are added as a first frame and a last frame, thereby outputting 120 frames. The 30 frames are played in 4 seconds, and the playing interval of each frame is 33 ms. Because the interpolation time phase is not uniform, the effect of non-uniform speed of the movement or change of the foreground in the target video can be achieved.
In this embodiment, as shown in fig. 4, a schematic diagram of generating a target video with non-uniform variation in one embodiment is shown. And performing frame interpolation processing on the two original images with unchanged backgrounds and changed foreground motion according to the non-uniform interpolation time phase to obtain a target video. The video resolution size is consistent with the original image. And horizontally moving the foreground in the target video at a constant speed from the position of the original image A to the position of the original image B in a non-uniform manner.
In the video generation method in this embodiment, when the target foreground change state is the first non-uniform change state, the non-uniform interpolation time phase corresponding to the original image set is determined, and frame interpolation processing is performed according to the non-uniform interpolation time phase, so that the foreground in the target video can achieve the effect of non-uniform change, and the operability of the video is improved.
In one embodiment, determining an interpolation time phase corresponding to an original image set according to a target foreground change state includes:
and (a1) when the target foreground change state is the second non-uniform change state, determining an interpolation time phase corresponding to each target image group, wherein the interpolation time phase is a uniform interpolation time phase or a non-uniform interpolation time phase.
Wherein the second nonuniform state and the first nonuniform state are both nonuniform states but not the same state. The second non-uniform change state is achieved by image replacement. Thus, the interpolated temporal phase may be a uniform interpolated temporal phase or a non-uniform interpolated temporal phase.
And (a2) performing frame interpolation processing on the original images in the original image set according to the interpolation time phase to obtain an interpolated image.
And a step (a3) of determining an image to be replaced from the interpolated images, and determining an adjacent image adjacent to the time phase of the image to be replaced.
Specifically, the electronic device may select any one of the images from the interpolated images as the image to be replaced, or select the intermediate image from the interpolated images as the image to be replaced is not limited thereto. The electronics determine neighboring images that are adjacent to the temporal phase of the image to be replaced. For example, the electronic device may use at least one of the interpolated image of the previous time phase and the interpolated image of the subsequent time phase of the image to be replaced as the adjacent image.
And (a4) replacing the image to be replaced with an adjacent image.
Specifically, the interpolation image includes, for example, a first interpolation image, a second interpolation image, and a third interpolation image. And the second interpolation image is an image to be replaced, the first interpolation image is determined to be an adjacent image, and two first interpolation images and one third interpolation image are obtained after replacement.
And (a5) splicing the adjacent images, the residual interpolation images except the images to be replaced and the original images in the original image set to generate the target video.
Specifically, the electronic device splices adjacent images, the residual interpolation images except the image to be replaced and the original images in the original image set according to the time sequence to generate the target video. For example, in two original images, 298 time phases are equally divided according to the foreground change degree of the foreground in the original images, one frame is interpolated in each phase, and the input images are added as a first frame and a last frame, so that 300 frames are output. The interpolated image in the 300-frame image is then replaced with the interpolated image of the previous or next frame, i.e., the adjacent image. The 300 frames are played within 10 seconds, and the playing interval of each frame is 33 ms. Because repeated images exist in the target video, the effect that the moving or changing speed of the foreground in the target video is unequal can be achieved.
In the video generation method in this embodiment, when the target foreground change state is the second non-uniform change state, the interpolation time phase corresponding to each target image group is determined, where the interpolation time phase may be a uniform interpolation time phase or a non-uniform interpolation time phase, and the interpolated frame generates an interpolated image, determines an image to be replaced and an adjacent image adjacent to the time phase of the image to be replaced, and generates a target video according to the interpolation time phase, so that the foreground in the target video can achieve an effect of non-uniform change, and the operability of the video is improved.
In one embodiment, determining an interpolation time phase corresponding to an original image set according to a target foreground change state includes: grouping original images in an original image set to obtain a target image group; and determining an interpolation time phase corresponding to each target image group in the original image set according to the target foreground change state.
Performing frame interpolation processing on at least two original images in the original image set according to the interpolation time phase to obtain a target video, comprising: and performing frame interpolation processing on at least two original images in the corresponding target image group according to each interpolation time phase to obtain a target video.
Specifically, when the number of the original images in the original image set is at least three, the electronic device groups the original images in the original image set to obtain a target image group. The number of original images in the target image group is at least two. Acquiring a target foreground change state corresponding to the original image set, wherein the target foreground change state corresponding to each target image group in the original image set can be acquired, namely each target image group corresponds to one target foreground change state; or an original image set only corresponding to a target foreground change state. Then, the electronic device determines the interpolation time phase corresponding to each target image group in the original image set according to the target foreground change state, and the interpolation time phase corresponding to each target image group may be the same or different.
In this embodiment, as shown in fig. 5, a schematic diagram of generating a target video according to a target image group in an embodiment is shown. The original image set comprises an original image A, an original image B and an original image C. The target image group includes a first image group including an original image a and an original image B, and a second image group including an original image C and an original image B, and the generated target video moves as a foreground from the position of the image a to the position of the image B, and then moves from the position of the image C to the position of the image B.
In the video generation method in this embodiment, when the number of the original images in the original image set is at least three, the original images in the original image set are grouped to obtain target image groups, an interpolation time phase corresponding to each target image group in the original image set is determined according to the target foreground change state, and the corresponding target image groups are subjected to frame interpolation according to each interpolation time phase to obtain target videos.
In one embodiment, grouping the original images in the original image set to obtain the target image group includes: and when the number of the original images in the original image set is at least three, selecting original target images adjacent in time phase from the at least three original images to form a target image group, and obtaining at least two target image groups.
Performing frame interpolation processing on at least two original images in the corresponding target image group according to each interpolation time phase to obtain a target video, wherein the frame interpolation processing comprises the following steps: performing frame interpolation processing on at least two original images in the corresponding target image group according to each interpolation time phase to obtain a reference video corresponding to each target image group; and splicing the reference videos into the target video according to the time sequence of the images in the target image group.
Specifically, the electronic equipment selects original target images adjacent in time from at least three original images to form a target image group, and obtains at least two target image groups. And the electronic equipment performs frame interpolation processing on at least two original images in the corresponding target image group according to the interpolation time phase to obtain a reference video corresponding to each target image group in the at least two target image groups. And the electronic equipment splices the reference videos into the target video according to the sequence of the images in the target image group. For example, if the original image set includes a first original image, a second original image and a third original image, the first original image and the second original target image are selected to form a first target image group; and selecting a second original image and a third original target image to form a second target image group, and obtaining at least two target image groups. And the electronic equipment carries out frame interpolation processing on each target image group respectively to obtain two reference videos. The reference video a corresponds to the first target image group, and the reference video B corresponds to the second target image group. And the time sequence of the images is the first original image, the second original image and the third original image, so that the spliced target video is spliced according to the sequence of the reference video A and the reference video B.
In this embodiment, as shown in fig. 6, a schematic diagram of generating a target video according to a target image group in another embodiment is shown. The original image set comprises an original image A, an original image B and an original image C. The target image group includes a first image group including an original image a and an original image B, and a second image group including an original image B and an original image C, and the generated target video moves as a foreground from the position of the image a to the position of the image B, and then moves from the position of the image B to the position of the image C.
In the video generation method in this embodiment, original target images adjacent in time are selected from at least three original images to form a target image group, at least two target image groups are obtained, frame interpolation processing is performed on at least two original images in the corresponding target image group according to each interpolation time phase, a reference video corresponding to each target image group is obtained, the reference videos are spliced into the target videos according to the time sequence of the images in the target image group, the effect that the foreground in the target videos sequentially changes can be achieved, and the operability of the videos is improved.
In one embodiment, grouping the original images in the original image set to obtain a corresponding target image group includes: when the number of the original images in the original image set is at least three, selecting original images adjacent in time from the at least three original images as image pairs, and connecting each two image pairs to form a target image set, wherein a backward image of a previous image pair in the target image set is a forward image of a next image pair.
The forward image refers to an image corresponding to the foreground initial position, and the backward image refers to an image corresponding to the foreground termination position. For example, the target foreground changing state is that the foreground moves from left to right at a constant speed, then the image corresponding to the left is a forward image, and the image corresponding to the right is a backward image. The target image group is a loop image group.
Specifically, time-adjacent original target images are selected from at least three original images to form an image pair, and at least two image pairs are obtained. And connecting two image pairs in at least two target image groups to form a target image group, wherein a backward image of a previous image pair in the target image group is a forward image of a next image pair. For example, if the original image set includes a first original image, a second original image and a third original image, the first original image and the second original target image are selected to form a first image pair; selecting a second original image and a third original target image to form a second image pair; and selecting a third original image and the first original target image to form a third image pair. Since the backward image of the previous image pair in the target image set is the forward image of the previous image pair, the first image pair → the second image pair → the third image pair → the first image pair … of the target image set results in the target image set. Then, the electronic device performs frame interpolation on the original image corresponding to each image pair in the target image group according to each interpolation time phase, so as to obtain a reference video corresponding to each image pair. And splicing the reference videos according to the time sequence of the images in the target image group to obtain the target video.
In this embodiment, as shown in fig. 7, a schematic diagram of generating a target video according to a target image group in an embodiment is shown. The original image set comprises an original image A, an original image B and an original image C. The target image group includes a first image pair, a second image pair and a third image pair, and constitutes a target image group. Wherein the first image pair comprises original image a and original image B, the second image pair comprises original image B and original image C, and the third image pair comprises original image C and original image a. Then 3 reference videos are generated, the number of image frames of the three videos may be different, and the three videos are played at the same frame rate when played. The generated target video moves from the position of the graph A to the position of the graph B as the foreground, and then moves from the position of the graph C to the position of the graph B.
In the video generation method in this embodiment, at least three original images are selected as target image groups, each target image group is connected two by two to form a cyclic image group, a backward image of a previous target image group in the cyclic image group is a forward image of a next target image group, and the cyclic image group is subjected to frame interpolation according to each interpolation time phase to obtain a target video, so that an effect of foreground cyclic motion in the target video can be achieved, and the operability of the video is improved.
In one embodiment, grouping the original images in the original image set to obtain the target image group includes: and determining a forward image and a corresponding backward image from the original image set, and combining the forward image and the corresponding backward target image into a target image group.
Wherein the target image group has a temporal order, i.e. the order of the forward image and the backward image. The number of forward images and the number of backward images are each at least one. And regarding the time corresponding to all the forward images in the same target image group as the same time, and regarding the time corresponding to all the backward images in the same target image group as the same time. For example, the original image set includes a forward image a and a backward image B, and then the forward image a and the backward image B constitute a target image group.
In the video generation method in this embodiment, a forward image and a corresponding backward image are determined from an original image set, and the forward image and the corresponding backward target image form a target image group, so that a change direction of a foreground in a target video can be determined, that is, a video in which the foreground changes adaptively in motion, posture and shape can be generated by changing from a corresponding position in a foreground image to a position in the backward image.
In one embodiment, at least one of the number of images of the forward image and the number of images of the backward image in the target image group is at least two;
performing frame interpolation processing on the original images in the corresponding target image group according to each interpolation time phase to obtain a target video, wherein the frame interpolation processing comprises the following steps:
and (b1) performing frame interpolation on each image pair in the image group according to the interpolation time phase to obtain each interpolation image, wherein the image pair comprises a forward image and a corresponding backward image.
The number of the forward images in the target image group is at least two, the number of the backward images is at least two, or the number of the forward images and the number of the backward images are both at least two.
And the electronic equipment performs frame interpolation processing on each image pair in the image group according to the interpolation time phase, wherein the image pair comprises a forward image and a corresponding backward image to obtain each interpolation image. For example, when there are an original image a and an original image B in the forward image and an original image C in the backward image, there are two forward images corresponding to the backward image, and there are two image pairs. Because the frame interpolation is performed between the forward image and the backward image, the electronic device needs to perform frame interpolation on a first image pair composed of the original image a and the original image C, and also perform frame interpolation on a second image pair composed of the original image B and the original image C to obtain each interpolated image.
And (b2) fusing the interpolation images with the matched phases respectively to obtain a first fused image.
Specifically, the electronic device fuses the interpolation images matched with each time phase respectively, that is, fuses at least two interpolation images matched with the phases to obtain first fusion images with the same number of matched phases. For example, if the first image pair includes an interpolated image a with a time phase of 1 and an interpolated image B with a time phase of 2, and the second image pair includes an interpolated image C with a time phase of 1 and an interpolated image D with a time phase of 2, the interpolated image a and the interpolated image C with a time phase of 1 are fused, and the interpolated image B and the interpolated image D with a time phase of 2 are fused, so as to obtain two fused images.
And (b3) fusing at least two target images with the number of images to obtain a second fused image, wherein the target image is at least one of the forward image and the backward image.
Specifically, when the number of images of the forward images is at least two, the at least two forward images are fused. And when the number of the images of the backward images is at least two, fusing the at least two backward images. And when the number of the images of the forward images and the number of the images of the backward images are at least two, respectively fusing the at least two forward images and fusing the at least two backward images.
And (b4) splicing the first fusion image and the second fusion image according to the time sequence of the images to obtain the target video.
Specifically, the electronic device splices the first fusion image and the second fusion image according to the time sequence of the images to obtain the target video. The temporal order of the images should be forward image → interpolated image → backward image.
FIG. 8 is a diagram illustrating generation of a target video in one embodiment. The target image group includes an original image a, an original image B, and an original image C. Wherein, the forward image is an original image A and an original image B, and the backward image is an original image C. The generated target video moves from the map A position to the map C position and moves from the map B position to the map C position simultaneously for the foreground.
In the video generation method in this embodiment, each image pair in the image group is subjected to frame interpolation processing according to the interpolation time phase to obtain each interpolation image, the interpolation images matched with each time phase are respectively fused to obtain a first fusion image, the target images with the number of images being at least two are fused to obtain a second fusion image, the first fusion image and the second fusion image are spliced according to the time sequence of the images to obtain the target video, so that the effect that the foreground in one image moves, deforms and the like towards the foreground positions in a plurality of images, or the foreground in a plurality of images moves, deforms and the like towards the position of the foreground in one image can be achieved, and the representation form of the target video can be increased.
In one embodiment, the target foreground changing state includes a target foreground motion trajectory. Grouping original images in an original image set to obtain a target image group, wherein the target image group comprises the following steps: and acquiring at least two target original images corresponding to the foreground motion trail from the original image set according to the foreground motion trail to obtain a target image group.
The target foreground motion track refers to a motion track of a foreground in a target video. The target foreground motion trajectory may be, for example, an "L" type, and an "S" type, but is not limited thereto.
Specifically, according to the foreground motion trajectory, the electronic device acquires at least two target original images corresponding to the foreground motion trajectory from the original image set to obtain a target image group. For example, if the target foreground motion trajectory is "L", the electronic device selects target original images with the foreground at the upper left corner, the lower left corner and the lower right corner from the original image set to obtain a target image group.
According to the video generation method in the embodiment, at least two target original images corresponding to the foreground motion track are obtained from the original image set according to the foreground motion track to obtain a target image group, so that a target video conforming to the motion track can be obtained, and the convenience of video production is improved.
In one embodiment, the original image set includes a first image frame in a first video and a second image frame in a second video, the first image frame is a last frame in the first video, the second image frame is a first frame in the second video, and backgrounds of the first image frame and the second image frame are matched. According to the interpolation time phase, carrying out frame interpolation processing on the original image in the original image set to obtain a target video, and the method comprises the following steps: performing frame interpolation processing on the first image frame and the second image frame according to the interpolation time phase to obtain a third video; acquiring a first video and a second video; and splicing the first video, the third video and the second video according to the image frame sequence to obtain the target video.
Specifically, the first video and the second video are not identical. The first image frame is the last frame in the first video, the second image frame is the first frame in the second video, and the backgrounds of the first image frame and the second image frame are matched. The electronic device performs frame interpolation processing on the first image frame and the second image frame according to the interpolation time phase to generate a third video, wherein the third video does not include the first image frame and the second image frame. The electronic equipment acquires a first video and a second video, the first image frame is a forward image frame, the second image frame is a backward image frame, and the first video, the third video and the second video are spliced according to the image frame sequence to obtain a target video. The video splicing tool can be a video coder-decoder, firstly three sections of videos are decoded, and the decoded data files are uniformly coded according to the playing sequence.
In this embodiment, the performing frame interpolation processing on the first image frame and the second image frame according to the interpolation time phase to obtain a third video includes: performing frame interpolation processing on the first image frame and the second image frame according to the interpolation time phase to obtain an interpolation image; determining an image to be replaced from the interpolation image, and determining an adjacent image adjacent to the time phase of the image to be replaced; replacing the image to be replaced with an adjacent image; and splicing the adjacent images, the residual interpolation images except the image to be replaced and the original images in the original image set to generate a third video.
In the video generation method in this embodiment, the first image frame and the second image frame are subjected to frame interpolation processing according to the interpolation time phase to obtain a third video, the first video and the second video are obtained, the first video, the third video and the second video are spliced according to the image frame sequence to obtain the target video, a video with motion, movement and pose change can be obtained, and the operability of the target video is improved.
In one embodiment, as shown in fig. 9, a schematic flow chart of interpolating a frame to obtain a target video in one embodiment is shown. Performing frame interpolation processing on the original image in the original image set according to the interpolation time phase to obtain a target video, and obtaining the target video, wherein the frame interpolation processing comprises the following steps:
step 108A, a forward image and a backward image are determined from the original image set.
And step 108B, performing forward motion estimation on the forward image and the backward image to obtain a forward motion vector, and performing backward motion estimation on the forward image and the backward image to obtain a backward motion vector.
Specifically, the forward image and the backward image are partitioned, the block size can be defined by a user, traversal is carried out according to the blocks, the best matching block of each block in the forward image in the backward image is found, the motion vector of each block in the forward image relative to the backward image is determined, and the forward motion vector is obtained. And traversing according to blocks, and searching the best matching block of each block in the backward image in the forward image to determine the motion vector of each block in the backward image relative to the forward image to obtain a backward motion vector. Fig. 10 is a diagram illustrating forward motion vectors and backward motion vectors in one embodiment. In one embodiment, the forward motion vector and the backward motion vector may be modified, which may refer to the motion vectors of neighboring blocks. Fig. 10A is a diagram illustrating the corrected forward motion vector and backward motion vector in one embodiment.
And 108C, mapping and correcting the forward motion vector and the backward motion vector according to the target interpolation time phase to obtain a forward mapping motion vector and a backward mapping motion vector corresponding to each interpolation block in the interpolation image.
Specifically, the forward mapped motion vector is used to represent a motion vector of the interpolation block with respect to the forward image, and the backward mapped motion vector is used to represent a motion vector of the interpolation block with respect to the backward image. If the forward motion vector is to map a block in the forward image to a backward image, the block in the forward image corresponding to the first forward motion vector passes through the interpolated image in the process of mapping the block to the backward image, and passes through the first interpolated block in the interpolated image, the first forward motion vector is a target motion vector corresponding to the first interpolated block, and the target motion vector corresponding to the first interpolated block is subjected to bidirectional mapping according to a target interpolation time phase, so that a forward mapped motion vector and a backward mapped motion vector corresponding to the first interpolated block are obtained. If the first forward motion vector is (3, -9) and the target interpolation time phase is 1/3, mapping and correcting are performed to obtain the forward mapping motion vector and the backward mapping motion vector corresponding to the first interpolation block as (1, -3) and (-1, 3), respectively.
Similarly, the backward motion vector is obtained by mapping a block in the backward image to the forward image, and passes through the interpolation image in the process of mapping the block in the backward image corresponding to the first backward motion vector to the forward image, and passes through the second interpolation block in the interpolation image, so that the first backward motion vector is a target motion vector corresponding to the second interpolation block, and the target motion vector corresponding to the second interpolation block is subjected to bidirectional mapping according to the target interpolation time phase, so that the forward mapping motion vector and the backward mapping motion vector corresponding to the second interpolation block are obtained. If the second backward motion vector is (-3, 9) and the target interpolation time phase is 1/3, the forward mapping motion vector and the backward mapping motion vector corresponding to the second interpolation block are (1, -3) and (-1, 3), respectively, after mapping and correction. Fig. 11 is a schematic diagram of forward mapping motion vectors and backward mapping motion vectors.
If an interpolation block has a plurality of forward motion vectors and a plurality of backward motion vectors passing through, the forward mapping motion vector and the backward mapping motion vector obtained by mapping and correcting all the passing motion vectors are used as the candidate forward mapping motion vector and the candidate backward mapping motion vector corresponding to the interpolation block. The target forward mapping motion vector and the target backward mapping motion vector corresponding to the interpolated block may be determined from the candidate forward mapping motion vector and backward mapping motion vector, and a specific screening method may be customized, for example, by calculating a matching error of a matching block corresponding to the forward mapping motion vector and the backward mapping motion vector.
And 108D, obtaining interpolation pixel values corresponding to the interpolation blocks according to the forward mapping motion vectors and the backward mapping motion vectors, and generating interpolation images according to the interpolation blocks.
Specifically, a first interpolation pixel value of the interpolation block in a forward image is obtained by forward mapping the motion vector, a second interpolation pixel value of the interpolation block in a backward image is obtained by backward mapping the motion vector, and the interpolation pixel value of the interpolation block is obtained by weighting the first interpolation pixel value and the second interpolation pixel value, wherein the determination of the weighting coefficient can be customized, so that the interpolation image is finally generated.
And step 108E, generating a target video according to the interpolation image.
In the video generation method in the embodiment, the forward motion vector and the backward motion vector are respectively obtained by calculating through the forward motion estimation and the backward motion estimation, and are mapped and corrected according to the target interpolation time phase, so that the forward mapping motion vector and the backward mapping motion vector corresponding to each interpolation block are obtained, the accuracy of determining the motion vector of the interpolation block can be improved, the generation quality of an interpolation image is improved, and the generation quality of a target video is improved.
In one embodiment, after acquiring the original set of images, the video generation method further comprises: the resolutions of the images in the original image set are adjusted so that the resolutions of the images in the original image set are consistent.
Specifically, when the resolution of the images in the original image set is different, one target image may be selected from the original image set, the resolution of the target image is taken as the target resolution, and each image in the original image set is up-sampled or down-sampled to be adjusted to the target resolution. Or according to the resolution of each image in the original image set, calculating to obtain a target resolution, wherein the range of the target resolution is between the maximum resolution and the minimum resolution of the images in the original image set, or acquiring any preset target resolution. When the resolution of the images in the original image set is the same, the resolution of the images in the original image set may also be adjusted to the target resolution by up-sampling or down-sampling. Wherein the determination of the target resolution may be influenced by the quality of the network or the performance of the terminal.
In the video generation method in this embodiment, the resolutions of the images in the original image set are adjusted to be consistent, so that subsequent interpolation is facilitated, and the image processing efficiency is improved.
In one embodiment, the frame interpolation processing is performed on the corresponding target image group according to the interpolation time phase to obtain a target video, and the method includes: and performing frame interpolation processing on the corresponding target image group according to the at least two interpolation time phases to generate an interpolation image corresponding to each interpolation time phase, and splicing the interpolation images according to the time phase sequence to generate a target video. In the video generation method in this embodiment, the interpolated images are spliced according to the time phase sequence to generate the target video, and the generated target video may not include the original image and may maintain the continuity of the images in the target video.
In one embodiment, the frame interpolation processing is performed on the corresponding target image group according to the interpolation time phase to obtain a target video, and the method includes: and performing frame interpolation processing on the corresponding target image group according to the at least two interpolation time phases to generate an interpolation image corresponding to each interpolation time phase, and splicing the interpolation images according to the time sequence of the images to generate a target video. According to the video generation method in the embodiment, the generated target video comprises the original image, so that the reality of the generated video is improved.
In one embodiment, the method can be used for image editing in the process of making a short video or an electronic photo album, and a novel special effect dynamic video is formed by utilizing the existing photos.
In one embodiment, in low power video recording, images may be captured at fixed intervals and formed into dynamic video by the method of the present application.
In a specific embodiment, as shown in fig. 12, a video generation system is provided by which a target video is generated by the system described in fig. 12. And adjusting the resolution through the up-down sampling module by at least two input original images. The resolution ratios of at least two original images can be different, when the resolution ratios are different, the original images can pass through the up-down sampling module or the up-down sampling module, and the resolution ratios of the up-down sampled images can be different from the resolution ratio of the original images. The original input frames may or may not be included in the target video, and the number of frames in the target video may be any number. And the characteristics of the target video are the change of the foreground and the constant background. The foreground is a region with a large shape change in the two or more input images, and the background is a region with a small shape change in the two or more input images. The output target video can be subjected to resolution change through the up-down sampling module.
It should be understood that although the various steps in the flowcharts of fig. 1, 10 and 12 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1, 10, and 12 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternatingly with other steps or at least some of the sub-steps or stages of other steps.
Fig. 13 is a block diagram showing a configuration of a video generating apparatus according to an embodiment. A video generating apparatus comprising a first obtaining module 1302, a second obtaining module 1304, a determining module 1306, and a frame inserting module 1308, wherein:
a first obtaining module 1302, configured to obtain an original image set, where the original image set includes at least two original images;
a second obtaining module 1304, configured to obtain a target foreground change state corresponding to the original image set;
a determining module 1306, configured to determine, according to the target foreground change state, an interpolation time phase corresponding to the original image set;
and a frame interpolation module 1308, configured to perform frame interpolation on the original images in the original image set according to the interpolation time phase, so as to obtain a target video.
The video generation device in this embodiment acquires an original image set, acquires a target foreground change state corresponding to the original image set, determines an interpolation time phase corresponding to the original image set according to the target foreground change state, performs frame interpolation processing on at least two original images in the original image set according to the interpolation time phase, and obtains a target video.
In one embodiment, the determining module 1306 is configured to determine a uniform interpolation time phase corresponding to each target image group when the target foreground changing state is a uniform changing state, where the uniform changing state includes at least one of a deformation uniform changing state, a motion uniform changing state, and a pose uniform changing state.
In the video generating device in this embodiment, when the target foreground change state is the uniform change state, the uniform interpolation time phase corresponding to the original image set is obtained, and frame interpolation processing is performed according to the uniform interpolation time phase, so that the foreground in the target video can achieve the effect of uniform change, and the authenticity of the generated target video is improved.
In one embodiment, the determining module 1306 is configured to determine a non-uniform interpolation time phase corresponding to the original image set when the target foreground changing state is a first non-uniform changing state.
In the video generating device in this embodiment, when the target foreground change state is the first non-uniform change state, the non-uniform interpolation time phase corresponding to the original image set is determined, and frame interpolation processing is performed according to the non-uniform interpolation time phase, so that the foreground in the target video can achieve the effect of non-uniform change, and the operability of the video is improved.
In one embodiment, the determining module 1306 is configured to determine an interpolation time phase corresponding to each target image group when the target foreground changing state is the second non-uniform changing state, where the interpolation time phase is a uniform interpolation time phase or a non-uniform interpolation time phase. The frame interpolation module 1308 is configured to perform frame interpolation on the original image in the original image set according to the interpolation time phase to obtain an interpolated image; determining an image to be replaced from the interpolation image, and determining an adjacent image adjacent to the time phase of the image to be replaced; replacing the image to be replaced with an adjacent image; and splicing the adjacent images, the residual interpolation images except the image to be replaced and the original images in the original image set to generate a target video.
In the video generating apparatus in this embodiment, when the target foreground change state is the second non-uniform change state, the interpolation time phase corresponding to each target image group is determined, where the interpolation time phase may be a uniform interpolation time phase or a non-uniform interpolation time phase, and the interpolated frame generates an interpolated image, determines an image to be replaced and an adjacent image adjacent to the time phase of the image to be replaced, and generates a target video according to the interpolation time phase, so that the foreground in the target video can achieve an effect of non-uniform change, and the operability of the video is improved.
In one embodiment, the determining module 1306 is configured to group the original images in the original image set to obtain a target image group; and determining an interpolation time phase corresponding to each target image group in the original image set according to the target foreground change state. The frame interpolation module 1308 is configured to perform frame interpolation processing on at least two original images in the corresponding target image group according to each interpolation time phase, so as to obtain a target video.
In the video generating apparatus in this embodiment, when the number of the original images in the original image set is at least three, the original images in the original image set are grouped to obtain target image groups, an interpolation time phase corresponding to each target image group in the original image set is determined according to the target foreground change state, and frame interpolation processing is performed on the corresponding target image group according to each interpolation time phase to obtain a target video, so that the foreground in the original images can generate an effect of different state changes according to the target foreground change state, the operability of video generation according to images is improved, and the authenticity of the generated video is also improved.
In one embodiment, the determining module 1306 is configured to select original target images that are adjacent in time phase from the at least three original images to form a target image group when the number of the original images in the original image set is at least three, resulting in at least two target image groups. The frame interpolation module 1308 is configured to perform frame interpolation on at least two original images in the corresponding target image group according to each interpolation time phase, so as to obtain a reference video corresponding to each target image group; and splicing the reference videos into the target video according to the time sequence of the images in the target image group.
In the video generation device in this embodiment, original target images adjacent in time are selected from at least three original images to form a target image group, at least two target image groups are obtained, frame interpolation processing is performed on at least two original images in the corresponding target image group according to each interpolation time phase, a reference video corresponding to each target image group is obtained, the reference videos are spliced into the target videos according to the time sequence of the images in the target image group, the effect that the foreground in the target videos sequentially changes can be achieved, and the operability of the videos is improved.
In one embodiment, the determining module 1306 is configured to, when the number of the original images in the original image set is at least three, select temporally adjacent original images from the at least three original images as a target image group, join each target image group two by two to form a loop image group, and a backward image of a previous target image group in the loop image group is a forward image of a next target image group. The frame interpolation module 1308 is configured to perform frame interpolation on the original image corresponding to each target image group in the cyclic image group according to each interpolation time phase, so as to obtain a reference video corresponding to each target image group; and splicing the reference videos according to the time sequence of the images in the circulating image group to obtain the target video.
In the video generation device in this embodiment, at least three original images select original images adjacent in time as a target image group, each target image group is connected two by two to form a cyclic image group, a backward image of a previous target image group in the cyclic image group is a forward image of a next target image group, and the cyclic image group is subjected to frame interpolation according to each interpolation time phase to obtain a target video, so that an effect of cyclic motion of a foreground in the target video can be achieved, and the operability of the video is improved.
In one embodiment, the determining module 1306 is configured to determine a forward image and a corresponding backward image from the original image set, and combine the forward image and the corresponding backward target image into a target image group.
In the video generating apparatus in this embodiment, a forward image and a corresponding backward image are determined from an original image set, and the forward image and the corresponding backward target image form a target image group, so that a change direction of a foreground in a target video can be determined, that is, a position corresponding to a foreground image is changed to a position in the backward image, and a video with a foreground adaptively changed in motion, posture and shape can be generated.
In one embodiment, at least one of the number of images of the forward image and the number of images of the backward image in the target image group is at least two. The frame interpolation module 1308 is configured to perform frame interpolation on each image pair in the image group according to the interpolation time phase to obtain each interpolated image, where the image pair includes a forward image and a corresponding backward image; respectively fusing the interpolation images matched with the phases to obtain first fusion images; fusing at least two target images with the number of images to obtain a second fused image, wherein the target image is at least one of the forward image and the backward image; and splicing the first fusion image and the second fusion image according to the time sequence of the images to obtain the target video.
The video generation device in this embodiment performs frame interpolation on each image pair in the image group according to the interpolation time phase to obtain each interpolation image, respectively fuses the interpolation images matched with each time phase to obtain a first fusion image, fuses the target images with at least two images to obtain a second fusion image, and splices the first fusion image and the second fusion image according to the time sequence of the images to obtain the target video, so that it is possible to implement that the foreground in one image moves, deforms, etc. toward the foreground position in multiple images, or the foreground in multiple images moves, deforms, etc. toward the position of the foreground in one image, and it is possible to increase the representation form of the target video.
In one embodiment, the target foreground changing state includes a target foreground motion trajectory. The determining module 1306 is configured to obtain at least two target original images corresponding to the foreground motion trajectory from the original image set according to the foreground motion trajectory, so as to obtain a target image group.
The video generation device in this embodiment obtains at least two target original images corresponding to the foreground motion trajectory from the original image set according to the foreground motion trajectory to obtain a target image group, and can obtain a target video conforming to the motion trajectory, thereby also improving convenience of video production.
In one embodiment, the original image set includes a first image frame in a first video and a second image frame in a second video, the first image frame is a last frame in the first video, the second image frame is a first frame in the second video, and backgrounds of the first image frame and the second image frame are matched. The frame interpolation module 1308 is configured to perform frame interpolation on the first image frame and the second image frame according to the interpolation time phase to obtain a third video; acquiring a first video and a second video; and splicing the first video, the third video and the second video according to the image frame sequence to obtain the target video.
The video generation device in this embodiment performs frame interpolation processing on the first image frame and the second image frame according to the interpolation time phase to obtain a third video, obtains the first video and the second video, and splices the first video, the third video and the second video according to the image frame sequence to obtain a target video, so that videos of motion, movement and pose change can be obtained, and the operability of the target video is improved.
In the first embodiment, the frame interpolation module 1308 is configured to determine a forward image and a backward image from an original image set; carrying out forward motion estimation on the forward image and the backward image to obtain a forward motion vector, and carrying out backward motion estimation on the forward image and the backward image to obtain a backward motion vector; mapping and correcting the forward motion vector and the backward motion vector according to the target interpolation time phase to obtain a forward mapping motion vector and a backward mapping motion vector corresponding to each interpolation block in the interpolation image; obtaining interpolation pixel values corresponding to the interpolation blocks according to the forward mapping motion vector and the backward mapping motion vector, and generating interpolation images according to the interpolation blocks; and generating a target video according to the interpolation image.
The video generation device in the embodiment obtains the forward motion vector and the backward motion vector by respectively calculating the forward motion estimation and the backward motion estimation, and maps and corrects the forward motion vector and the backward motion vector according to the target interpolation time phase to obtain the forward mapping motion vector and the backward mapping motion vector corresponding to each interpolation block, so that the accuracy of determining the motion vector of the interpolation block can be improved, the generation quality of an interpolation image is improved, and the generation quality of a target video is improved.
In one embodiment, the video generation apparatus further comprises an adjustment module for adjusting the resolution of the images in the original image set so that the resolution of the images in the original image set is consistent.
The video generation device in this embodiment adjusts the resolutions of the images in the original image set to be consistent, so as to facilitate subsequent interpolation and improve the efficiency of image processing.
The division of the modules in the video generating apparatus is only for illustration, and in other embodiments, the video generating apparatus may be divided into different modules as needed to complete all or part of the functions of the video generating apparatus.
For specific limitations of the video generation apparatus, reference may be made to the above limitations of the video generation method, which is not described herein again. The modules in the video generating apparatus can be implemented in whole or in part by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
Fig. 14 is a schematic diagram of an internal structure of an electronic device in one embodiment. As shown in fig. 14, the electronic device includes a processor and a memory connected by a system bus. Wherein, the processor is used for providing calculation and control capability and supporting the operation of the whole electronic equipment. The memory may include a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The computer program can be executed by a processor to implement a video generation method provided in the following embodiments. The internal memory provides a cached execution environment for the operating system computer programs in the non-volatile storage medium. The electronic device may be any terminal device such as a mobile phone, a tablet computer, a PDA (Personal digital assistant), a Point of sale (POS), a vehicle-mounted computer, and a wearable device.
The implementation of each module in the video generation apparatus provided in the embodiment of the present application may be in the form of a computer program. The computer program may be run on a terminal or a server. Program modules constituted by such computer programs may be stored on the memory of the electronic device. Which when executed by a processor, performs the steps of the method described in the embodiments of the present application.
The embodiment of the application also provides a computer readable storage medium. One or more non-transitory computer-readable storage media containing computer-executable instructions that, when executed by one or more processors, cause the processors to perform the steps of the video generation method.
A computer program product containing instructions which, when run on a computer, cause the computer to perform a video generation method.
Any reference to memory, storage, database, or other medium used herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (16)

1. A method of video generation, comprising:
acquiring an original image set, wherein the original image set comprises at least two original images;
acquiring a target foreground change state corresponding to the original image set;
determining an interpolation time phase corresponding to the original image set according to the target foreground change state;
and performing frame interpolation processing on the original image in the original image set according to the interpolation time phase to obtain a target video.
2. The method according to claim 1, wherein the determining the interpolation time phase corresponding to the original image set according to the target foreground variation state comprises:
and when the target foreground change state is a uniform change state, acquiring a uniform interpolation time phase corresponding to the original image set, wherein the uniform change state comprises at least one of a deformation uniform change state, a motion uniform change state and a pose uniform change state.
3. The method according to claim 1, wherein the determining the interpolation time phase corresponding to the original image set according to the target foreground variation state comprises:
and when the target foreground change state is a first non-uniform change state, determining a non-uniform interpolation time phase corresponding to the original image set.
4. The method according to claim 1, wherein the determining the interpolation time phase corresponding to the original image set according to the target foreground variation state comprises:
when the target foreground change state is a second non-uniform change state, determining an interpolation time phase corresponding to each target image group, wherein the interpolation time phase is a uniform interpolation time phase or a non-uniform interpolation time phase;
the frame interpolation processing is performed on the original image in the original image set according to the interpolation time phase to obtain a target video, and the frame interpolation processing includes:
performing frame interpolation processing on the original image in the original image set according to the interpolation time phase to obtain an interpolated image;
determining an image to be replaced from the interpolation images, and determining an adjacent image adjacent to the time phase of the image to be replaced;
replacing the image to be replaced with the adjacent image;
and splicing the adjacent images, the residual interpolation images except the image to be replaced and the original images in the original image set to generate a target video.
5. The method according to claim 1, wherein the determining the interpolation time phase corresponding to the original image set according to the target foreground variation state comprises:
grouping the original images in the original image set to obtain a target image group;
determining an interpolation time phase corresponding to each target image group in the original image set according to the target foreground change state;
the frame interpolation processing is performed on the original image in the original image set according to the interpolation time phase to obtain a target video, and the frame interpolation processing includes:
and performing frame interpolation processing on the original images in the corresponding target image group according to the interpolation time phases to obtain a target video.
6. The method of claim 5, wherein the grouping the raw images in the raw image set to obtain a target image group comprises:
when the number of original images in the original image set is at least three, selecting original target images adjacent in time from the at least three original images to form a target image group, and obtaining at least two target image groups;
the frame interpolation processing is performed on the original image in the corresponding target image group according to each interpolation time phase to obtain a target video, and the method comprises the following steps:
performing frame interpolation on the original images in the corresponding target image groups according to the interpolation time phases to obtain a reference video corresponding to each target image group;
and splicing the reference videos into the target video according to the time sequence of the images in the target image group.
7. The method of claim 5, wherein the grouping the original images in the original image set to obtain a corresponding target image group comprises:
when the number of the original images in the original image set is at least three, original target images adjacent in time are selected from the at least three original images to form image pairs, every two image pairs of each image pair are connected to form a target image group, and a backward image of a previous image pair in the target image group is a forward image of a next image pair.
8. The method of claim 5, wherein the grouping the raw images in the raw image set to obtain a target image group comprises:
determining a forward image and a corresponding backward image from the original image set, and combining the forward image and the corresponding backward target image into a target image set.
9. The method according to claim 8, wherein at least one of the number of images of the forward image and the number of images of the backward image in the target image group is at least two;
the frame interpolation processing is performed on the original image in the corresponding target image group according to each interpolation time phase to obtain a target video, and the method comprises the following steps:
performing frame interpolation on each image pair in the image group according to the interpolation time phase to obtain each interpolation image, wherein the image pair comprises a forward image and a corresponding backward image;
respectively fusing the interpolation images matched with the time phases to obtain first fusion images;
fusing at least two target images with the number of images to obtain a second fused image, wherein the target image is at least one of the forward image and the backward image;
and splicing the first fusion image and the second fusion image according to the time sequence of the images to obtain a target video.
10. The method of claim 5, wherein the target foreground changing state comprises a target foreground motion trajectory;
the grouping of the original images in the original image set to obtain a target image group includes:
and acquiring at least two target original images corresponding to the foreground motion trail from the original image set according to the foreground motion trail to obtain a target image group.
11. The method according to any one of claims 1 to 3, wherein the original image set comprises a first image frame in a first video and a second image frame in a second video, the first image frame being a last frame in the first video, the second image frame being a first frame in the second video, and the backgrounds of the first image frame and the second image frame match;
the frame interpolation processing is performed on the original image in the original image set according to the interpolation time phase to obtain a target video, and the frame interpolation processing includes:
performing frame interpolation processing on the first image frame and the second image frame according to the interpolation time phase to obtain a third video;
acquiring the first video and the second video;
and splicing the first video, the third video and the second video according to the image frame sequence to obtain a target video.
12. The method according to any one of claims 1 to 3, wherein the interpolating the original images in the original image set according to the interpolated temporal phase to obtain a target video, and obtaining the target video comprises:
determining a forward image and a backward image from the original image set;
carrying out forward motion estimation on the forward image and the backward image to obtain a forward motion vector;
carrying out backward motion estimation on the forward image and the backward image to obtain a backward motion vector;
mapping and correcting the forward motion vector and the backward motion vector according to the interpolation time phase to obtain a forward mapping motion vector and a backward mapping motion vector corresponding to each interpolation block in an interpolation image;
obtaining interpolation pixel values corresponding to the interpolation blocks according to the forward mapping motion vector and the backward mapping motion vector, and generating an interpolation image according to the interpolation blocks;
and generating a target video according to the interpolation image.
13. The method of any of claims 1 to 3, wherein after said acquiring the set of original images, the method further comprises:
adjusting the resolution of the original images in the original image set so that the resolution of the original images in the original image set is consistent.
14. A video generation apparatus, comprising:
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring an original image set, and the original image set comprises at least two original images;
the second acquisition module is used for acquiring a target foreground change state corresponding to the original image set;
the determining module is used for determining an interpolation time phase corresponding to the original image set according to the target foreground change state;
and the frame interpolation module is used for carrying out frame interpolation processing on the original images in the original image set according to the interpolation time phase to obtain a target video.
15. An electronic device comprising a memory and a processor, the memory having stored therein a computer program that, when executed by the processor, causes the processor to perform the steps of the video generation method of any of claims 1 to 13.
16. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 13.
CN202010381604.0A 2020-05-08 2020-05-08 Video generation method, device, electronic equipment and computer readable storage medium Active CN111586321B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010381604.0A CN111586321B (en) 2020-05-08 2020-05-08 Video generation method, device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010381604.0A CN111586321B (en) 2020-05-08 2020-05-08 Video generation method, device, electronic equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN111586321A true CN111586321A (en) 2020-08-25
CN111586321B CN111586321B (en) 2023-05-12

Family

ID=72120411

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010381604.0A Active CN111586321B (en) 2020-05-08 2020-05-08 Video generation method, device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111586321B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111951157A (en) * 2020-09-02 2020-11-17 深圳传音控股股份有限公司 Image processing method, apparatus and storage medium
CN112511859A (en) * 2020-11-12 2021-03-16 Oppo广东移动通信有限公司 Video processing method, device and storage medium
CN112995533A (en) * 2021-02-04 2021-06-18 上海哔哩哔哩科技有限公司 Video production method and device

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001045036A1 (en) * 1999-12-14 2001-06-21 Dynapel Systems, Inc. Slow motion system
CN101543065A (en) * 2007-02-20 2009-09-23 索尼株式会社 Image display device, video signal processing device, and video signal processing method
CN101808205A (en) * 2009-02-18 2010-08-18 索尼爱立信移动通信股份公司 Moving image output method and moving image output apparatus
CN102360514A (en) * 2011-10-20 2012-02-22 中国地质大学(武汉) Dynamic frame interpolation technology-based curved surface process time and space simulation method
CN103402098A (en) * 2013-08-19 2013-11-20 武汉大学 Video frame interpolation method based on image interpolation
US20140294368A1 (en) * 2013-03-29 2014-10-02 Kabushiki Kaisha Toshiba Moving-image playback device
CN105120337A (en) * 2015-08-28 2015-12-02 小米科技有限责任公司 Video special effect processing method, video special effect processing device and terminal equipment
CN106791279A (en) * 2016-12-30 2017-05-31 中国科学院自动化研究所 Motion compensation process and system based on occlusion detection
CN108040217A (en) * 2017-12-20 2018-05-15 深圳岚锋创视网络科技有限公司 A kind of decoded method, apparatus of video and camera
CN109922372A (en) * 2019-02-26 2019-06-21 深圳市商汤科技有限公司 Video data handling procedure and device, electronic equipment and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001045036A1 (en) * 1999-12-14 2001-06-21 Dynapel Systems, Inc. Slow motion system
CN101543065A (en) * 2007-02-20 2009-09-23 索尼株式会社 Image display device, video signal processing device, and video signal processing method
CN101808205A (en) * 2009-02-18 2010-08-18 索尼爱立信移动通信股份公司 Moving image output method and moving image output apparatus
CN102360514A (en) * 2011-10-20 2012-02-22 中国地质大学(武汉) Dynamic frame interpolation technology-based curved surface process time and space simulation method
US20140294368A1 (en) * 2013-03-29 2014-10-02 Kabushiki Kaisha Toshiba Moving-image playback device
CN103402098A (en) * 2013-08-19 2013-11-20 武汉大学 Video frame interpolation method based on image interpolation
CN105120337A (en) * 2015-08-28 2015-12-02 小米科技有限责任公司 Video special effect processing method, video special effect processing device and terminal equipment
CN106791279A (en) * 2016-12-30 2017-05-31 中国科学院自动化研究所 Motion compensation process and system based on occlusion detection
CN108040217A (en) * 2017-12-20 2018-05-15 深圳岚锋创视网络科技有限公司 A kind of decoded method, apparatus of video and camera
CN109922372A (en) * 2019-02-26 2019-06-21 深圳市商汤科技有限公司 Video data handling procedure and device, electronic equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111951157A (en) * 2020-09-02 2020-11-17 深圳传音控股股份有限公司 Image processing method, apparatus and storage medium
CN112511859A (en) * 2020-11-12 2021-03-16 Oppo广东移动通信有限公司 Video processing method, device and storage medium
CN112995533A (en) * 2021-02-04 2021-06-18 上海哔哩哔哩科技有限公司 Video production method and device

Also Published As

Publication number Publication date
CN111586321B (en) 2023-05-12

Similar Documents

Publication Publication Date Title
KR102281017B1 (en) Neural network model training method, apparatus and storage medium for image processing
CN111586321B (en) Video generation method, device, electronic equipment and computer readable storage medium
CN110324664B (en) Video frame supplementing method based on neural network and training method of model thereof
CN111586409B (en) Method and device for generating interpolation frame, electronic equipment and storage medium
CN109598744B (en) Video tracking method, device, equipment and storage medium
CN111629262A (en) Video image processing method and device, electronic equipment and storage medium
CN111614911B (en) Image generation method and device, electronic device and storage medium
CN111402139B (en) Image processing method, apparatus, electronic device, and computer-readable storage medium
JP4564564B2 (en) Moving picture reproducing apparatus, moving picture reproducing method, and moving picture reproducing program
US20210097650A1 (en) Image processing method, storage medium, image processing apparatus, learned model manufacturing method, and image processing system
CN111491204B (en) Video repair method, video repair device, electronic equipment and computer-readable storage medium
CN111754429B (en) Motion vector post-processing method and device, electronic equipment and storage medium
CN113724155B (en) Self-lifting learning method, device and equipment for self-supervision monocular depth estimation
Han et al. Hybrid high dynamic range imaging fusing neuromorphic and conventional images
CN114339409A (en) Video processing method, video processing device, computer equipment and storage medium
KR20210089737A (en) Image depth estimation method and apparatus, electronic device, storage medium
CN112906609A (en) Video important area prediction method and device based on two-way cross attention network
CN114640885B (en) Video frame inserting method, training device and electronic equipment
CN112884657B (en) Face super-resolution reconstruction method and system
CN111462021B (en) Image processing method, apparatus, electronic device, and computer-readable storage medium
CN111711823B (en) Motion vector processing method and apparatus, electronic device, and storage medium
CN111726526B (en) Image processing method and device, electronic equipment and storage medium
US9106926B1 (en) Using double confirmation of motion vectors to determine occluded regions in images
CN114885144B (en) High frame rate 3D video generation method and device based on data fusion
Gao et al. Real-time image enhancement with attention aggregation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant