CN113269086A - Vilog editing method and system - Google Patents

Vilog editing method and system Download PDF

Info

Publication number
CN113269086A
CN113269086A CN202110564551.0A CN202110564551A CN113269086A CN 113269086 A CN113269086 A CN 113269086A CN 202110564551 A CN202110564551 A CN 202110564551A CN 113269086 A CN113269086 A CN 113269086A
Authority
CN
China
Prior art keywords
frame
nth
motion vector
vlog
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110564551.0A
Other languages
Chinese (zh)
Inventor
文振海
叶飞
陈欢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Ruidong Technology Development Co ltd
Original Assignee
Suzhou Ruidong Technology Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Ruidong Technology Development Co ltd filed Critical Suzhou Ruidong Technology Development Co ltd
Priority to CN202110564551.0A priority Critical patent/CN113269086A/en
Publication of CN113269086A publication Critical patent/CN113269086A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4007Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/48Matching video sequences

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computing Systems (AREA)
  • Television Systems (AREA)

Abstract

The invention relates to the technical field of video clips, and discloses a vlog clipping method and a clipping system, wherein the vlog clipping method comprises the following steps of S1: selecting an Nth first frame, an Nth tail frame, an N +1 th first frame and an N +1 th tail frame in the vlog video according to the input of a user; s2: determining the image similarity of the Nth tail frame and the (N + 1) th frame; s3: judging whether the image similarity falls into a preset similarity interval, if so, executing S4, and if not, executing S5; s4: connecting the Nth tail frame with the (N + 1) th first frame; s5: judging whether the image similarity is smaller than the minimum value of the preset similarity interval, if so, executing S6, and if not, executing S7; s6: generating a transition frame, and connecting the Nth tail frame with the (N + 1) th first frame by using the transition frame; s7: deleting the N +1 th first frame, setting the first video frame after the first frame as a new N +1 th first frame or deleting the nth tail frame, setting the first video frame before the first video frame as a new nth tail frame, and re-executing S2; the invention ensures that the vlog has no obvious editing trace at the connecting point, thereby improving the viewing experience.

Description

Vilog editing method and system
Technical Field
The invention relates to the technical field of video editing, in particular to a vlog editing method and a vlog editing system.
Background
The name of vlog (video log), a Chinese name is video blog, which is a video form formed by shooting and processing the daily life of the creator as a material, and the main contents of the actions and behaviors of the person. In the creation process of the vlog, videos cannot be recorded successfully at one time, but actions of people or objects in the same scene need to be divided into multiple sections for recording, then appropriate clipping points are searched in the multiple sections of videos, and the multiple sections of videos are clipped and combined into a complete long video. Because the two frames of images before and after the clipping point are not formed by continuous shooting, the action change of people or objects in the video frames before and after the clipping point is easy to have larger difference compared with the action change in the adjacent video frames which are continuously shot, and if the change is overlarge, the action is not consistent; if the variation is too small, the movement is slow, slow and unnatural. These all can make the segment at the clipping point not match with the rhythm of other parts of the video, so that the finally obtained vlog video clipping trace is obvious, and the user viewing experience is poor.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a vlog clipping method and a vlog clipping system, which can solve the problems that a vlog video is not consistent at the clipping position and the rhythm is slow.
In order to achieve the above purpose, the invention provides the following technical scheme:
a vlog clipping method, comprising: s1: selecting an Nth first frame, an Nth tail frame, an N +1 th first frame and an N +1 th tail frame in a vlog video according to input of a user, and extracting a plurality of video frames from the Nth first frame to the Nth tail frame and a plurality of video frames from the N +1 th first frame to the N +1 th tail frame; s2: determining the image similarity of the Nth tail frame and the (N + 1) th frame; s3: judging whether the image similarity falls into a preset similarity interval, if so, executing S4, and if not, executing S5; s4: connecting the Nth tail frame with the (N + 1) th frame to generate a merged vlog video; s5: judging whether the image similarity is smaller than the minimum value of a preset similarity interval, if so, executing S6, otherwise, executing S7; s6: generating a transition frame according to the Nth tail frame and the (N + 1) th first frame, and connecting the Nth tail frame and the (N + 1) th first frame by using the transition frame to generate a merged vlog video; s7: deleting the N +1 th first frame, setting the first video frame after the N +1 th first frame as a new N +1 th first frame or deleting the nth last frame, setting the first video frame before the nth last frame as a new nth last frame, and then re-executing S2.
In the present invention, preferably, the generating the transition frame according to the nth tail frame and the (N + 1) th head frame in S6 includes: s601: determining a bidirectional motion vector by using the Nth tail frame and the (N + 1) th first frame; s602: and (4) interpolating and constructing a transition frame according to the bidirectional motion vector.
In the present invention, preferably, the S601 includes: s6011: determining an initial motion vector from the Nth tail frame to the (N + 1) th frame by a forward motion vector estimation method; s6012: and determining a bidirectional motion vector by a bidirectional motion vector estimation method by using the initial motion vector.
In the present invention, preferably, the S602 includes: s6021: detecting whether an abnormal part exists in the bidirectional motion vector, and if so, correcting the abnormal part; s6022: and carrying out interpolation calculation by an overlapped block motion compensation technology according to the detected and corrected bidirectional motion vector to obtain a transition frame.
In the present invention, preferably, the S2 includes: s201: generating gray-scale images of the Nth tail frame and the (N + 1) th first frame; s202: and calculating the similarity between the gray level image of the Nth tail frame and the gray level image of the (N + 1) th head frame, namely the image similarity.
In the present invention, preferably, the image similarity is obtained by using mutual information between the grayscale map of the nth last frame and the grayscale map of the (N + 1) th frame.
A vlog clipping system comprising: the selecting and extracting module is used for determining an Nth first frame, an Nth tail frame, an N +1 th first frame and an N +1 th tail frame in the vlog video according to the input of a user, and extracting a plurality of video frames from the Nth first frame to the Nth tail frame and a plurality of video frames from the N +1 th first frame to the N +1 th tail frame; the similarity determining module is used for determining the image similarity of the Nth tail frame and the (N + 1) th frame; the first judgment module is used for judging whether the image similarity falls into a preset similarity interval or not; the first generation module is used for connecting the Nth tail frame with the (N + 1) th head frame to generate a merged vlog video; the second judgment module is used for judging whether the image similarity is smaller than the minimum value of the similarity interval or not; the second generation module is used for generating a transition frame according to the Nth tail frame and the (N + 1) th first frame, and connecting the Nth tail frame and the (N + 1) th first frame by using the transition frame to generate a merged nlog video; and the reselection module is used for deleting the (N + 1) th frame and setting the video frame after the (N + 1) th frame as the (N + 1) th frame.
In the present invention, preferably, the second generating module includes: a bidirectional motion vector determination submodule for determining a bidirectional motion vector using the nth end frame and the (N + 1) th start frame; and the construction submodule is used for interpolating and constructing the transition frame according to the bidirectional motion vector.
In the present invention, preferably, the bidirectional motion vector determination submodule includes: an initial motion vector determining unit, configured to determine an initial motion vector from the nth last frame to the (N + 1) th first frame by a forward motion vector estimation method; a bidirectional motion vector determination unit for determining a bidirectional motion vector by a bidirectional motion vector estimation method using the initial motion vector; the construction submodule includes: the detection and correction unit is used for detecting whether an abnormal part exists in the bidirectional motion vector or not, and if so, correcting the abnormal part; and the interpolation calculation unit is used for carrying out interpolation calculation by an overlapped block motion compensation technology according to the detected and corrected bidirectional motion vector to obtain a transition frame.
In the present invention, preferably, the similarity determining module includes: the conversion submodule is used for generating a gray level image of the Nth tail frame and the (N + 1) th first frame; and the similarity calculation operator module is used for calculating the similarity between the gray level image of the Nth tail frame and the gray level image of the (N + 1) th head frame, namely the image similarity.
Compared with the prior art, the invention has the beneficial effects that:
the invention discloses a vlog clipping method and a clipping system, which are used for calculating the image similarity of two video frames before and after a connection point of two video segments to be connected, and then determining a video merging strategy according to the position relation between the image similarity and a preset similarity interval: if the image similarity falls into a preset similarity interval, directly connecting the two video clips; if the image similarity is smaller than the minimum value of the preset similarity interval, generating a transition frame according to two video frames before and after the connection point, and connecting the two video clips by using the transition frame; and if the image similarity is larger than the maximum value of the preset similarity interval, two video frames before and after the connection point are too similar, and the next video frame is reselected as the video frame after the connection point to carry out similarity calculation, connection and other work. Through the distinguishing processing process, the generated complete vlog has no obvious editing trace at the connecting point, so that the picture of the vlog video can be more coherent, the video playing is smoother, and the watching experience is improved; the transition frame is constructed by adopting the bidirectional motion vector, and the bidirectional motion vector is corrected to a certain extent, so that the image quality of the constructed transition frame is higher, and the transition frame is more smoothly and naturally linked with the video frames before and after the clipping point.
Drawings
Fig. 1 is a flow chart of the vlog clipping method of the present invention.
Fig. 2 is a flowchart of S6 in the vlog clipping method of the present invention.
Fig. 3 is a flowchart of S601 in the vlog clipping method of the present invention.
Fig. 4 is a flowchart of S602 in the vlog clipping method of the present invention.
Fig. 5 is a flowchart of S2 in the vlog clipping method of the present invention.
Fig. 6 is a schematic diagram of the structure of the vlog clipping system of the present invention.
Fig. 7 is a schematic structural diagram of a second generation module in the vlog clipping system of the present invention.
Fig. 8 is a schematic structural diagram of a bidirectional motion vector determination sub-module and a construction sub-module in the vlog clipping system of the present invention.
Fig. 9 is a schematic structural diagram of a similarity determination module in the vlog clipping system of the present invention.
In the drawings: 1-selecting and extracting module, 2-similarity determining module, 21-converting sub-module, 22-similarity operator module, 3-first judging module, 4-first generating module, 5-second judging module, 6-second generating module, 61-bidirectional motion vector determining sub-module, 611-initial motion vector determining unit, 612-bidirectional motion vector determining unit, 62-constructing sub-module, 621-detecting and correcting unit, 622-interpolation calculating unit and 7-reselecting module.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that when an element is referred to as being "secured to" another element, it can be directly on the other element or intervening elements may also be present. When a component is referred to as being "connected" to another component, it can be directly connected to the other component or intervening components may also be present. When a component is referred to as being "disposed on" another component, it can be directly on the other component or intervening components may also be present. The terms "vertical," "horizontal," "left," "right," and the like as used herein are for illustrative purposes only.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
Referring to fig. 1 to 5, a preferred embodiment of the present invention provides a vlog clipping method, including:
s1: the method comprises the steps of selecting an Nth first frame, an Nth tail frame, an (N + 1) th first frame and an (N + 1) th tail frame in a vlog video according to input of a user, and extracting a plurality of video frames from the Nth first frame to the Nth tail frame and a plurality of video frames from the (N + 1) th first frame to the (N + 1) th tail frame.
The vlog video can be in various existing video formats and is imported into video processing software (such as a video clipping tool, a player and the like), the vlog video is decomposed into a sequence of a plurality of video frames, each video frame can display a thumbnail picture and a normal picture, a user confirms the content of each video frame by observing the thumbnail picture and the normal picture of each video frame, and then a video clip to be intercepted and merged is selected by inputting specific content into the video processing software.
And setting the intercepted video clips as N and N +1, and connecting the tail part of the video clip N with the head part of the video clip N +1 to synthesize the final desired vlog video. The first video frame of the video clip N is the Nth initial frame, the last video frame is the Nth end frame, the first video frame of the video clip N +1 is the (N + 1) th initial frame, and the last video frame is the (N + 1) th end frame. The user selects the Nth first frame, the Nth tail frame, the (N + 1) th first frame and the (N + 1) th tail frame by inputting the specific content of the characteristics, so that the intercepted video can be determined to be a sequence formed by a plurality of video frames from the Nth first frame to the Nth tail frame and a sequence formed by a plurality of video frames from the (N + 1) th first frame to the (N + 1) th tail frame, namely video clips N and N + 1. The extraction process of the video clips N and N +1 is to copy a sequence formed by a plurality of video frames from the Nth first frame to the Nth tail frame and a sequence formed by a plurality of video frames from the (N + 1) th first frame to the (N + 1) th tail frame for subsequent processing.
S2: and determining the image similarity of the Nth tail frame and the (N + 1) th head frame.
When editing, the relation between the (N + 1) th first frame and the (N) th last frame selected by the user may be a coherent picture or a discontinuous picture. When the pictures are not connected, the (N + 1) th frame and the (N) th frame are directly connected, which does not belong to the scene used by the method. When the images of the (N + 1) th frame and the (N) th frame are consecutive, the (N + 1) th frame and the (N) th frame have higher image similarity, and because the (N + 1) th frame and the (N) th frame are not continuously shot images, the difference between the image similarity of the (N + 1) th frame and the (N) th frame and the image similarity of two actual continuous video frames is larger, the (N + 1) th frame and the (N) th frame can be compared, the image similarity of the two frames can be determined, whether the (N + 1) th frame and the (N) th frame have better continuity or not is considered through the index, and therefore subsequent measures are determined. The similarity can be expressed by means of gray variance, mutual information, correlation coefficient, joint entropy and the like.
Specifically, as shown in fig. 5, S2 includes:
s201: and generating gray level images of the Nth tail frame and the (N + 1) th first frame.
The video frames are usually RGB three-channel color images, which are relatively complex and inconvenient to process, so that the grayscale images of the nth end frame and the (N + 1) th first frame are generated. The algorithm for converting the RGB three-channel color image into the Gray map may be a floating point algorithm (Gray ═ R0.3+ G0.59+ B0.11), an integer method (Gray ═ R30+ G59+ B11)/100), an average value method (Gray ═ R + G + B)/3), or the like.
S202: and calculating the similarity between the gray level image of the Nth tail frame and the gray level image of the (N + 1) th head frame, namely the image similarity.
After the conversion into the gray scale image, the similarity between the gray scale image of the nth end frame and the gray scale image of the (N + 1) th initial frame is the image similarity between the nth end frame and the (N + 1) th initial frame.
In various expression modes of the similarity, the mutual information has higher accuracy and robustness, so the embodiment preferably uses the mutual information to express the similarity between the gray-scale image of the nth tail frame and the gray-scale image of the (N + 1) th head frame. Mutual information may reflect statistical dependencies between two systems, indicating how much information one system contains the other.
Mutual information between two random variables a and B can be expressed as:
Figure BDA0003080445160000071
in the formula, pAB(a, B) is the joint probability density distribution of random variables A and B, pA(a) And pB(b) The edge probability density distribution of the random variables a and B, respectively.
Mutual information can also be described by information entropy, and according to the definition of the information entropy, there are:
Figure BDA0003080445160000072
Figure BDA0003080445160000073
Figure BDA0003080445160000074
wherein, H (A) and H (B) are information entropies of random variables A and B, respectively, H (A, B) is joint entropy of A and B, and H (A | B) and H (B | A) are conditional entropy of A under the condition of known B and conditional entropy of B under the condition of known A, respectively. The entropy of information expresses a measure of the average uncertainty or information content of the system, namely: h (A) represents the mean uncertainty of the random variable A; h (A | B) represents the average uncertainty that A still exists after B is known; the difference between H (a) and H (a | B) indicates the amount of decrease in uncertainty of a after B is known, that is, the amount of information of a contained in B. Therefore, the mutual information between two random variables a and B can be expressed as:
I(A,B)=H(A)-H(A|B)=H(B)-H(B|A)=H(A)+H(B)-H(A,B) (5)
if A, B are independent of each other, pAB(a,b)=pA(a)pB(b) When I (a, B) ═ 0; if A, B is completely dependent, pAB(a,b)=pA(a)=pB(b) That is, when H (a, B) ═ H (a) ═ H (B), I (a, B) is the largest.
The nth last frame and the (N + 1) th first frame can be considered as two random variables a and B of them with respect to the image gray level, which can be obtained by normalizing the joint gray level histogram:
Figure BDA0003080445160000081
where i and j are the gray values of the pixels in images a and B, h (i, j) is the logarithm of the pixels with gray value (i, j) in the overlapping portion of the two images, and Σ h (i, j) is the total number of pixels in the overlapping portion of the two images, whereby the edge probability distribution can be expressed as:
Figure BDA0003080445160000082
Figure BDA0003080445160000083
the mutual information I (A, B) of A and B can be calculated by substituting the formulas (6), (7) and (8) into the formula (1).
Since the mutual information represented by I (a, B) is not intuitive enough to represent the similarity, a relative value S (a, B) can be used to represent the similarity of B with respect to a:
Figure BDA0003080445160000084
the value of S (A, B) is between 0 and 1, in the invention, A is the Nth tail frame, B is the (N + 1) th frame, and S (A, B) should be approximately between 0.5 and 1 because the Nth tail frame and the (N + 1) th frame have higher similarity.
S3: and judging whether the image similarity falls into a preset similarity interval, if so, executing S4, and if not, executing S5.
The preset similarity interval is a range of possible values of similarity between two normal consecutive video frames, and is usually between 0.75 and 0.95. As described above, if S (a, B) falls into the preset similarity interval, it indicates that the image similarity between the nth end frame and the (N + 1) th frame is not different from that between two normal consecutive video frames, and after the two frames are connected, the situation of discontinuous pictures or slow rhythm does not occur, and no obvious editing trace exists, so S4 can be directly executed.
S4: and connecting the Nth tail frame with the (N + 1) th frame to generate a merged vlog video.
At this time, all video frames before the nth end frame until the nth head frame are connected with all video frames after the (N + 1) th head frame until the (N + 1) th end frame, and the merged vlog video can be generated through subsequent video compression and other operations.
S5: and judging whether the image similarity is smaller than the minimum value of the preset similarity interval, if so, executing S6, and otherwise, executing S7.
The image similarity does not fall into the preset similarity interval and can be divided into two cases: the image similarity is smaller than the minimum value of the preset similarity interval or the image similarity is larger than the maximum value of the preset similarity interval.
When the image similarity is smaller than the minimum value of the preset similarity interval, it is described that the image similarity between the nth last frame and the (N + 1) th frame is smaller than the similarity of the normal continuous frames, the difference between the nth last frame and the (N + 1) th frame is large, and the situation of video incoherence easily occurs in direct connection, so that S6 needs to be executed to generate a transition frame to fill the space between the nth last frame and the (N + 1) th frame, and the nth last frame and the (N + 1) th frame are connected to enable the merged video to be smooth.
S6: and generating a transition frame according to the Nth tail frame and the (N + 1) th first frame, and connecting the Nth tail frame and the (N + 1) th first frame by using the transition frame to generate a merged vlog video.
From the nth last frame and the (N + 1) th first frame, a transition frame may be generated by a motion compensation interpolation frame method. The basic idea is as follows: and establishing a motion model according to the Nth tail frame and the (N + 1) th frame, defining the change trend from the Nth tail frame to the (N + 1) th frame as a motion vector, estimating the motion vector according to the difference between the motion vector and the motion vector, and constructing a transition frame on the basis of the Nth tail frame and the (N + 1) th frame according to the motion vector. The motion vector estimation can be performed by methods based on an optical flow equation (an optical flow field is obtained by estimating the gray value gradient of a space-time image and appropriate space-time smoothing conditions are used in combination), a block motion model (an image is divided into blocks with a certain size, pixels in a unified block are assumed to have the same motion vector, a best matching block in a certain size range of an (N + 1) th frame and/or an nth tail frame is searched according to a current block, a motion vector is obtained according to previous relative displacement), a pixel recursive method (a pixel prediction value is a motion vector estimation value at a corresponding pixel position of the nth tail frame or a linear combination of pixel points in a neighborhood of the current pixel, and a pixel prediction model is further modified according to a principle that the difference between the pixel prediction value and an actual value is small), and the like.
And after the transition frame is obtained, filling the transition frame between the Nth tail frame and the (N + 1) th frame, connecting all video frames before the Nth tail frame and till the Nth head frame with all video frames after the (N + 1) th frame and till the (N + 1) th tail frame through the transition frame, and generating a whole section of complete merged vlog video required by a producer through subsequent video compression and other operations.
S7: deleting the N +1 th first frame, setting the first video frame after the N +1 th first frame as a new N +1 th first frame or deleting the nth last frame, setting the first video frame before the nth last frame as a new nth last frame, and then re-executing S2.
When the image similarity of the Nth tail frame and the (N + 1) th frame is larger than the maximum value of the preset similarity interval, the similarity of the (N + 1) th frame and the (N + 1) th tail frame is extremely high, and the video formed by directly connecting the (N + 1) th frame and the (N + 1) th frame shows the effect of picture stagnation, so that the rhythm of the video at the position is slowed down, and the editing trace is obvious. Thus, one of the two video frames may be discarded and the connection of the sequence of video frames reorganized. Taking the case of discarding the (N + 1) th first frame, deleting the (N + 1) th first frame, in the selected video clip (N + 1), taking the first video frame after the (N + 1) th first frame as a new (N + 1) th first frame, then starting from S2, re-executing S2 and the following steps, judging whether the (N) th last frame and the (N + 1) th first frame meet the requirement of similarity, and if so, directly connecting to generate a merged vlog video; if the similarity is too small, generating a transition frame, and connecting the Nth tail frame with the (N + 1) th head frame by using the transition frame to generate a merged vlog video; and if the similarity is too large, continuously abandoning the (N + 1) th frame, selecting a new (N + 1) th frame, and repeating the process until a merged vlog video without obvious editing traces can be generated.
In the present embodiment, as shown in fig. 2, S6 preferably includes:
s601: and determining a bidirectional motion vector by using the Nth tail frame and the (N + 1) th head frame.
S602: and (4) interpolating and constructing a transition frame according to the bidirectional motion vector.
The basic principle of estimating motion vector by using block matching motion model is to divide each frame of video sequence into several non-overlapping blocks, consider the displacement of all pixels in the block to be the same, then find out the block most similar to the current block, i.e. matching block, according to a certain block matching criterion in the given search range from each block to reference frame, and the relative displacement between the matching block and the current block is the motion vector. The block matching criterion may adopt SAD (absolute error criterion), and the calculation formula is:
Figure BDA0003080445160000111
wherein the size of the matched block is M × M, (d)x,dy) Representing candidate motion vectors, Ft(m, n) is the gray value at the pixel point (m, n) of the current frame, FtRepresenting the current frame, Ft-1The reference frame is shown, and in the present embodiment, the nth end frame or the (N + 1) th start frame may be used as the reference frame.
To avoid the problem of holes and overlaps, the motion vector used in the present embodiment is a bidirectional motion vector. The method divides a transition frame into a plurality of blocks, each block has two motion vectors, one of the motion vectors points to an Nth tail frame, the other points to an N +1 th head frame, and pixels of the blocks in the transition frame are obtained by interpolation of the two motion vectors. The motion compensation frame interpolation method based on the bidirectional motion vector estimation carries out interpolation recovery, and the calculation formula is as follows:
Figure BDA0003080445160000112
wherein v isx、vyFor the previous frame F estimated from the bi-directional motion vectort-1And the following frame Ft+1The motion vectors in x and y directions between the first frame and the second frame, the Nth end frame and the (N + 1) th first frame can be respectively used as Ft-1And Ft+1
In the method, for a certain block of a transition frame, by comparing a block at a certain displacement in the Nth tail frame with a block at an opposite displacement in the N +1 th first frame, the displacement which is most matched with the two blocks in the Nth tail frame and the N +1 th first frame is found out to be used as the motion vector of the block in the transition frameAmount of the compound (A). With BiRepresents a block in the transition frame, assuming a motion vector search range of [ -a, a [ ]]. For each candidate motion vector in the search range
Figure BDA0003080445160000121
Calculating the corresponding position shift of the block in the Nth tail frame
Figure BDA0003080445160000122
Corresponding position shift in the N +1 th frame
Figure BDA0003080445160000123
The sum of the absolute difference values of each pixel in the corresponding block of the Nth tail frame and the (N + 1) th head frame is as follows:
Figure BDA0003080445160000124
wherein
Figure BDA0003080445160000125
Pixel point locations in the block. In [ -a, a [ ]]Searching in to the smallest sum of absolute differences
Figure BDA0003080445160000126
I.e. the motion vector estimate for the block in the transition frame.
In the present embodiment, as shown in fig. 3, S601 preferably includes:
s6011: and determining an initial motion vector from the Nth tail frame to the (N + 1) th head frame by a forward motion vector estimation method.
Dividing the Nth tail frame into M multiplied by M blocks, selecting blocks at corresponding positions of the (N + 1) th first frame and a certain number of pixel points at the upper, lower, left and right sides as search ranges, using SAD as a criterion, finding out a matching block and a block with the minimum SAD value from the search range from each block of the Nth tail frame to the (N + 1) th first frame, and then calculating the relative displacement of the current block and the matching block to obtain the forward motion vector of the Nth tail frame
Figure BDA0003080445160000127
Dividing by 2 to obtain the initial motion vector of the transition frame between the Nth end frame and the (N + 1) th first frame, and recording as
Figure BDA0003080445160000128
The initial forward motion vector is
Figure BDA0003080445160000129
The backward motion vector is
Figure BDA00030804451600001210
S6012: and determining a bidirectional motion vector by a bidirectional motion vector estimation method by using the initial motion vector.
After determining the initial bidirectional motion vector and performing the first motion vector adjustment
Figure BDA0003080445160000131
Then, using the initial motion vector as an initial motion vector, and using a bidirectional motion vector estimation method to adjust the initial motion vector in a small search range, the basic process is as follows: block B in transition frameiOne pixel point of is
Figure BDA0003080445160000132
Based on the corrected initial motion vector
Figure BDA0003080445160000133
Respectively finding out block B in Nth tail frame and N +1 th head frameiCorresponding matching block B1And B2. Block B1And B2The range of the surrounding 1-3 layers of pixel points is used as a search range. Pixel point
Figure BDA0003080445160000134
At block B1And B2Is a point
Figure BDA0003080445160000135
And
Figure BDA0003080445160000136
respectively by means of initial motion vectors
Figure BDA0003080445160000137
The calculation is as follows:
Figure BDA0003080445160000138
Figure BDA0003080445160000139
assuming that the motion of the object in the image is uniform, the two matching blocks B1And B2Is synchronous symmetrical, i.e. if block B is a block B1When moving to the left, block B corresponding to threshold2The corresponding number of steps will be moved to the right. For the search range [ -2,2]Each candidate motion vector in
Figure BDA00030804451600001310
According to the idea of the bidirectional motion vector estimation method, calculating the sum of absolute differences of each pixel in corresponding blocks in the Nth tail frame and the (N + 1) th head frame, wherein the formula is as follows:
Figure BDA00030804451600001311
then finding out the condition of minimum SAD, i.e. the best matching condition, and adjusting the initial motion vector to obtain the bidirectional motion vector
Figure BDA00030804451600001312
In the present embodiment, as shown in fig. 4, S602 preferably includes:
s6021: and detecting whether an abnormal part exists in the bidirectional motion vector, and if so, correcting the abnormal part.
In some cases, an object exists only in the nth end frame or the (N + 1) th first frame, which usually occurs when an old object disappears or a new object appears. In this case, the two frames do not need to be weighted to recover the transition frame, and only the frame where the object exists needs to be obtained according to the corresponding motion vector. To accommodate this, the above-mentioned bidirectional motion vector needs to be corrected.
In the image disappearance or appearance region, the residual value SAD of the corresponding matching block between two frames is large because it exists only in one frame and does not exist in the other frame. According to this feature, an abnormal portion in which an isolated motion vector (which is considered as an isolated motion vector if the motion vector of a certain block and the motion vector of an adjacent block do not coincide) can be detected based on the correlation between the vectors, and then the isolated motion vector can be corrected.
Detecting an isolated motion vector, firstly, respectively calculating absolute difference values of the motion vector of a block to be detected and the motion vectors of eight blocks adjacent to the block to be detected, wherein the formula is as follows:
dx,i=|vx-vx,i|,dy,i=|vy-vy,j| (16)
when an isolated motion vector is detected, the motion vector sizes of eight blocks around the current block are sorted to find two intermediate values, and then the average value of the two intermediate values is used to replace the original motion vector.
Then, the motion vector may be further smoothed: for block B in transition frame and several adjacent blocks N around iti
Figure BDA0003080445160000141
And
Figure BDA0003080445160000142
corresponding bi-directional motion vector (SADB), by which
Figure BDA0003080445160000143
And obtaining the pixel value difference of the corresponding block of the current block B in the Nth tail frame and the (N + 1) th first frame. Formula for calculation such asThe following:
Figure BDA0003080445160000144
the block B is then adjusted for bi-directional motion vectors
Figure BDA0003080445160000145
Can be obtained by the following formula:
Figure BDA0003080445160000146
s6022: and carrying out interpolation calculation by an overlapped block motion compensation technology according to the detected and corrected bidirectional motion vector to obtain a transition frame.
Given an M × M sized block B, a small overlap width w, the original block size is expanded to (N +2w) × (N +2 w). Eight blocks N adjacent to each otheriExpanding to the same size, i 1,2., 8, three different regions R1 (no overlap), R2 (two overlap) and R3 (four overlap) result. Let
Figure BDA0003080445160000151
Representing the use of bi-directional motion vectors according to equation (11)
Figure BDA0003080445160000152
Obtained
Figure BDA0003080445160000153
Motion compensated values of the points. The OBMC (overlapped block motion compensation) result is then obtained for different regions in block B using the following series of block overlap strategies:
in the region R1, the position of the first electrode,
Figure BDA0003080445160000154
in the region R2, the position of the first electrode,
Figure BDA0003080445160000155
in the region R3, the position of the first electrode,
Figure BDA0003080445160000156
wherein SkThe definition is as follows:
Figure BDA0003080445160000157
Figure BDA0003080445160000158
Figure BDA0003080445160000159
Figure BDA00030804451600001510
through the calculation process, the transition frame can be obtained.
Referring to fig. 6 to 9, a preferred embodiment of the present invention further provides a vlog clipping system, which includes: the selecting and extracting module is used for determining an Nth first frame, an Nth tail frame, an N +1 th first frame and an N +1 th tail frame in the vlog video according to the input of a user, and extracting a plurality of video frames from the Nth first frame to the Nth tail frame and a plurality of video frames from the N +1 th first frame to the N +1 th tail frame; the similarity determining module is used for determining the image similarity of the Nth tail frame and the (N + 1) th frame; the first judgment module is used for judging whether the image similarity falls into a preset similarity interval or not; the first generation module is used for connecting the Nth tail frame with the (N + 1) th head frame to generate a merged vlog video; the second judgment module is used for judging whether the image similarity is smaller than the minimum value of the similarity interval or not; the second generation module is used for generating a transition frame according to the Nth tail frame and the (N + 1) th first frame, and connecting the Nth tail frame and the (N + 1) th first frame by using the transition frame to generate a merged nlog video; and the reselection module is used for deleting the (N + 1) th frame and setting the video frame after the (N + 1) th frame as the (N + 1) th frame.
As shown in fig. 7, the second generation module includes: a bidirectional motion vector determination submodule for determining a bidirectional motion vector using the nth end frame and the (N + 1) th start frame; and the construction submodule is used for interpolating and constructing the transition frame according to the bidirectional motion vector.
The bidirectional motion vector determination sub-module includes: an initial motion vector determining unit, configured to determine an initial motion vector from the nth last frame to the (N + 1) th first frame by a forward motion vector estimation method; a bidirectional motion vector determination unit for determining a bidirectional motion vector by a bidirectional motion vector estimation method using the initial motion vector, as shown in fig. 8.
The construction submodule includes: the detection and correction unit is used for detecting whether an abnormal part exists in the bidirectional motion vector or not, and if so, correcting the abnormal part; and an interpolation calculation unit, configured to perform interpolation calculation by using an overlapped block motion compensation technique according to the detected and corrected bidirectional motion vector, so as to obtain a transition frame, as shown in fig. 8.
The similarity determination module comprises: the conversion submodule is used for generating a gray level image of the Nth tail frame and the (N + 1) th first frame; and the similarity calculation operator module is used for calculating the similarity between the gray level image of the nth tail frame and the gray level image of the (N + 1) th head frame, namely the image similarity, as shown in fig. 9.
The modules and units are all modularized functional entities, and are realized by a computer device. The computer device comprises a processor, a memory, an input and output device and a bus; the bus is respectively connected with the processor, the memory and the input and output equipment; the processor is configured to perform the various functions described above. The number of processors and memories may be one or more, among others. The memory can be volatile memory or persistent memory.
The above description is intended to describe in detail the preferred embodiments of the present invention, but the embodiments are not intended to limit the scope of the claims of the present invention, and all equivalent changes and modifications made within the technical spirit of the present invention should fall within the scope of the claims of the present invention.

Claims (10)

1. A vlog clipping method, comprising:
s1: selecting an Nth first frame, an Nth tail frame, an N +1 th first frame and an N +1 th tail frame in a vlog video according to input of a user, and extracting a plurality of video frames from the Nth first frame to the Nth tail frame and a plurality of video frames from the N +1 th first frame to the N +1 th tail frame;
s2: determining the image similarity of the Nth tail frame and the (N + 1) th frame;
s3: judging whether the image similarity falls into a preset similarity interval, if so, executing S4, and if not, executing S5;
s4: connecting the Nth tail frame with the (N + 1) th frame to generate a merged vlog video;
s5: judging whether the image similarity is smaller than the minimum value of a preset similarity interval, if so, executing S6, otherwise, executing S7;
s6: generating a transition frame according to the Nth tail frame and the (N + 1) th first frame, and connecting the Nth tail frame and the (N + 1) th first frame by using the transition frame to generate a merged vlog video;
s7: deleting the N +1 th first frame, setting the first video frame after the N +1 th first frame as a new N +1 th first frame or deleting the nth last frame, setting the first video frame before the nth last frame as a new nth last frame, and then re-executing S2.
2. The vlog clipping method according to claim 1, wherein the generating the transition frame according to the nth tail frame and the N +1 th head frame in S6 comprises:
s601: determining a bidirectional motion vector by using the Nth tail frame and the (N + 1) th first frame;
s602: and (4) interpolating and constructing a transition frame according to the bidirectional motion vector.
3. The vlog clipping method according to claim 2, wherein the S601 comprises:
s6011: determining an initial motion vector from the Nth tail frame to the (N + 1) th frame by a forward motion vector estimation method;
s6012: and determining a bidirectional motion vector by a bidirectional motion vector estimation method by using the initial motion vector.
4. The vlog clipping method according to claim 3, wherein the S602 comprises:
s6021: detecting whether an abnormal part exists in the bidirectional motion vector, and if so, correcting the abnormal part;
s6022: and carrying out interpolation calculation by an overlapped block motion compensation technology according to the detected and corrected bidirectional motion vector to obtain a transition frame.
5. The vlog clipping method according to claim 1, wherein the S2 comprises:
s201: generating gray-scale images of the Nth tail frame and the (N + 1) th first frame;
s202: and calculating the similarity between the gray level image of the Nth tail frame and the gray level image of the (N + 1) th head frame, namely the image similarity.
6. The vlog clipping method according to claim 5, wherein the image similarity is derived by using mutual information of the gray scale map of the Nth end frame and the gray scale map of the (N + 1) th first frame.
7. A vlog clipping system, comprising:
the selecting and extracting module is used for determining an Nth first frame, an Nth tail frame, an N +1 th first frame and an N +1 th tail frame in the vlog video according to the input of a user, and extracting a plurality of video frames from the Nth first frame to the Nth tail frame and a plurality of video frames from the N +1 th first frame to the N +1 th tail frame;
the similarity determining module is used for determining the image similarity of the Nth tail frame and the (N + 1) th frame;
the first judgment module is used for judging whether the image similarity falls into a preset similarity interval or not;
the first generation module is used for connecting the Nth tail frame with the (N + 1) th head frame to generate a merged vlog video;
the second judgment module is used for judging whether the image similarity is smaller than the minimum value of the similarity interval or not;
the second generation module is used for generating a transition frame according to the Nth tail frame and the (N + 1) th first frame, and connecting the Nth tail frame and the (N + 1) th first frame by using the transition frame to generate a merged nlog video;
and the reselection module is used for deleting the (N + 1) th frame and setting the video frame after the (N + 1) th frame as the (N + 1) th frame.
8. The vlog clipping system of claim 7, wherein the second generating module comprises:
a bidirectional motion vector determination submodule for determining a bidirectional motion vector using the nth end frame and the (N + 1) th start frame;
and the construction submodule is used for interpolating and constructing the transition frame according to the bidirectional motion vector.
9. The vlog clipping system of claim 8, wherein the bi-directional motion vector determination sub-module comprises:
an initial motion vector determining unit, configured to determine an initial motion vector from the nth last frame to the (N + 1) th first frame by a forward motion vector estimation method;
a bidirectional motion vector determination unit for determining a bidirectional motion vector by a bidirectional motion vector estimation method using the initial motion vector;
the construction submodule includes:
the detection and correction unit is used for detecting whether an abnormal part exists in the bidirectional motion vector or not, and if so, correcting the abnormal part;
and the interpolation calculation unit is used for carrying out interpolation calculation by an overlapped block motion compensation technology according to the detected and corrected bidirectional motion vector to obtain a transition frame.
10. The vlog clipping method according to claim 7, wherein the similarity determination module comprises:
the conversion submodule is used for generating a gray level image of the Nth tail frame and the (N + 1) th first frame;
and the similarity calculation operator module is used for calculating the similarity between the gray level image of the Nth tail frame and the gray level image of the (N + 1) th head frame, namely the image similarity.
CN202110564551.0A 2021-05-24 2021-05-24 Vilog editing method and system Pending CN113269086A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110564551.0A CN113269086A (en) 2021-05-24 2021-05-24 Vilog editing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110564551.0A CN113269086A (en) 2021-05-24 2021-05-24 Vilog editing method and system

Publications (1)

Publication Number Publication Date
CN113269086A true CN113269086A (en) 2021-08-17

Family

ID=77232413

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110564551.0A Pending CN113269086A (en) 2021-05-24 2021-05-24 Vilog editing method and system

Country Status (1)

Country Link
CN (1) CN113269086A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114679605A (en) * 2022-03-25 2022-06-28 腾讯科技(深圳)有限公司 Video transition method and device, computer equipment and storage medium
CN114866839A (en) * 2022-07-11 2022-08-05 深圳市鼎合丰科技有限公司 Video editing software system based on repeated frame image merging

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103152566A (en) * 2013-02-22 2013-06-12 华中科技大学 Video frame rate promoting method
WO2016187776A1 (en) * 2015-05-25 2016-12-01 北京大学深圳研究生院 Video frame interpolation method and system based on optical flow method
CN108509917A (en) * 2018-03-30 2018-09-07 北京影谱科技股份有限公司 Video scene dividing method and device based on shot cluster correlation analysis
CN111294644A (en) * 2018-12-07 2020-06-16 腾讯科技(深圳)有限公司 Video splicing method and device, electronic equipment and computer storage medium
CN111327945A (en) * 2018-12-14 2020-06-23 北京沃东天骏信息技术有限公司 Method and apparatus for segmenting video
CN112584196A (en) * 2019-09-30 2021-03-30 北京金山云网络技术有限公司 Video frame insertion method and device and server
CN112700516A (en) * 2020-12-23 2021-04-23 杭州群核信息技术有限公司 Video rendering method and device based on deep learning, computer equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103152566A (en) * 2013-02-22 2013-06-12 华中科技大学 Video frame rate promoting method
WO2016187776A1 (en) * 2015-05-25 2016-12-01 北京大学深圳研究生院 Video frame interpolation method and system based on optical flow method
CN108509917A (en) * 2018-03-30 2018-09-07 北京影谱科技股份有限公司 Video scene dividing method and device based on shot cluster correlation analysis
CN111294644A (en) * 2018-12-07 2020-06-16 腾讯科技(深圳)有限公司 Video splicing method and device, electronic equipment and computer storage medium
CN111327945A (en) * 2018-12-14 2020-06-23 北京沃东天骏信息技术有限公司 Method and apparatus for segmenting video
CN112584196A (en) * 2019-09-30 2021-03-30 北京金山云网络技术有限公司 Video frame insertion method and device and server
CN112700516A (en) * 2020-12-23 2021-04-23 杭州群核信息技术有限公司 Video rendering method and device based on deep learning, computer equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114679605A (en) * 2022-03-25 2022-06-28 腾讯科技(深圳)有限公司 Video transition method and device, computer equipment and storage medium
CN114679605B (en) * 2022-03-25 2023-07-18 腾讯科技(深圳)有限公司 Video transition method, device, computer equipment and storage medium
CN114866839A (en) * 2022-07-11 2022-08-05 深圳市鼎合丰科技有限公司 Video editing software system based on repeated frame image merging

Similar Documents

Publication Publication Date Title
US8045620B2 (en) Image processing apparatus, image processing method and computer readable medium
US8928813B2 (en) Methods and apparatus for reducing structured noise in video
CN106210449B (en) Multi-information fusion frame rate up-conversion motion estimation method and system
Yu et al. Multi-level video frame interpolation: Exploiting the interaction among different levels
US20060262853A1 (en) Low complexity motion compensated frame interpolation method
US8274602B2 (en) Image processing apparatus and image processing method with redundant frame detection
JPH08205177A (en) Movement vector detection device
US20030007667A1 (en) Methods of and units for motion or depth estimation and image processing apparatus provided with such motion estimation unit
US20140126818A1 (en) Method of occlusion-based background motion estimation
CN113269086A (en) Vilog editing method and system
JP2005176381A (en) Adaptive motion compensated interpolating method and apparatus
Philip et al. A comparative study of block matching and optical flow motion estimation algorithms
JP2014110020A (en) Image processor, image processing method and image processing program
JP2003203237A (en) Image matching method and device, and image coding method and device
EP2237560A1 (en) Halo reducing motion-compensated interpolation
CN109788297B (en) Video frame rate up-conversion method based on cellular automaton
KR102066012B1 (en) Motion prediction method for generating interpolation frame and apparatus
CN111340101A (en) Stability evaluation method and device, electronic equipment and computer readable storage medium
JP4378801B2 (en) Image processing method and image processing apparatus
JP5334241B2 (en) Frame image motion vector estimation apparatus and program
Bae et al. Census transform-based static caption detection for frame rate up-conversion
KR100343780B1 (en) Method of Camera Motion Detection in Compressed Domain for Content-Based Indexing of Compressed Video
JP2007287006A (en) Moving object tracking device and program
Lu et al. An artifact information based motion vector processing method for motion compensated frame interpolation
Lin et al. Key-frame-based depth propagation for semi-automatic stereoscopic video conversion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination