CN113269086A

CN113269086A - Vilog editing method and system

Info

Publication number: CN113269086A
Application number: CN202110564551.0A
Authority: CN
Inventors: 文振海; 叶飞; 陈欢
Original assignee: Suzhou Ruidong Technology Development Co ltd
Current assignee: Suzhou Ruidong Technology Development Co ltd
Priority date: 2021-05-24
Filing date: 2021-05-24
Publication date: 2021-08-17

Abstract

The invention relates to the technical field of video clips, and discloses a vlog clipping method and a clipping system, wherein the vlog clipping method comprises the following steps of S1: selecting an Nth first frame, an Nth tail frame, an N +1 th first frame and an N +1 th tail frame in the vlog video according to the input of a user; s2: determining the image similarity of the Nth tail frame and the (N + 1) th frame; s3: judging whether the image similarity falls into a preset similarity interval, if so, executing S4, and if not, executing S5; s4: connecting the Nth tail frame with the (N + 1) th first frame; s5: judging whether the image similarity is smaller than the minimum value of the preset similarity interval, if so, executing S6, and if not, executing S7; s6: generating a transition frame, and connecting the Nth tail frame with the (N + 1) th first frame by using the transition frame; s7: deleting the N +1 th first frame, setting the first video frame after the first frame as a new N +1 th first frame or deleting the nth tail frame, setting the first video frame before the first video frame as a new nth tail frame, and re-executing S2; the invention ensures that the vlog has no obvious editing trace at the connecting point, thereby improving the viewing experience.

Description

Vilog editing method and system

Technical Field

The invention relates to the technical field of video editing, in particular to a vlog editing method and a vlog editing system.

Background

The name of vlog (video log), a Chinese name is video blog, which is a video form formed by shooting and processing the daily life of the creator as a material, and the main contents of the actions and behaviors of the person. In the creation process of the vlog, videos cannot be recorded successfully at one time, but actions of people or objects in the same scene need to be divided into multiple sections for recording, then appropriate clipping points are searched in the multiple sections of videos, and the multiple sections of videos are clipped and combined into a complete long video. Because the two frames of images before and after the clipping point are not formed by continuous shooting, the action change of people or objects in the video frames before and after the clipping point is easy to have larger difference compared with the action change in the adjacent video frames which are continuously shot, and if the change is overlarge, the action is not consistent; if the variation is too small, the movement is slow, slow and unnatural. These all can make the segment at the clipping point not match with the rhythm of other parts of the video, so that the finally obtained vlog video clipping trace is obvious, and the user viewing experience is poor.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a vlog clipping method and a vlog clipping system, which can solve the problems that a vlog video is not consistent at the clipping position and the rhythm is slow.

In order to achieve the above purpose, the invention provides the following technical scheme:

a vlog clipping method, comprising: s1: selecting an Nth first frame, an Nth tail frame, an N +1 th first frame and an N +1 th tail frame in a vlog video according to input of a user, and extracting a plurality of video frames from the Nth first frame to the Nth tail frame and a plurality of video frames from the N +1 th first frame to the N +1 th tail frame; s2: determining the image similarity of the Nth tail frame and the (N + 1) th frame; s3: judging whether the image similarity falls into a preset similarity interval, if so, executing S4, and if not, executing S5; s4: connecting the Nth tail frame with the (N + 1) th frame to generate a merged vlog video; s5: judging whether the image similarity is smaller than the minimum value of a preset similarity interval, if so, executing S6, otherwise, executing S7; s6: generating a transition frame according to the Nth tail frame and the (N + 1) th first frame, and connecting the Nth tail frame and the (N + 1) th first frame by using the transition frame to generate a merged vlog video; s7: deleting the N +1 th first frame, setting the first video frame after the N +1 th first frame as a new N +1 th first frame or deleting the nth last frame, setting the first video frame before the nth last frame as a new nth last frame, and then re-executing S2.

In the present invention, preferably, the generating the transition frame according to the nth tail frame and the (N + 1) th head frame in S6 includes: s601: determining a bidirectional motion vector by using the Nth tail frame and the (N + 1) th first frame; s602: and (4) interpolating and constructing a transition frame according to the bidirectional motion vector.

In the present invention, preferably, the S601 includes: s6011: determining an initial motion vector from the Nth tail frame to the (N + 1) th frame by a forward motion vector estimation method; s6012: and determining a bidirectional motion vector by a bidirectional motion vector estimation method by using the initial motion vector.

In the present invention, preferably, the S602 includes: s6021: detecting whether an abnormal part exists in the bidirectional motion vector, and if so, correcting the abnormal part; s6022: and carrying out interpolation calculation by an overlapped block motion compensation technology according to the detected and corrected bidirectional motion vector to obtain a transition frame.

In the present invention, preferably, the S2 includes: s201: generating gray-scale images of the Nth tail frame and the (N + 1) th first frame; s202: and calculating the similarity between the gray level image of the Nth tail frame and the gray level image of the (N + 1) th head frame, namely the image similarity.

In the present invention, preferably, the image similarity is obtained by using mutual information between the grayscale map of the nth last frame and the grayscale map of the (N + 1) th frame.

A vlog clipping system comprising: the selecting and extracting module is used for determining an Nth first frame, an Nth tail frame, an N +1 th first frame and an N +1 th tail frame in the vlog video according to the input of a user, and extracting a plurality of video frames from the Nth first frame to the Nth tail frame and a plurality of video frames from the N +1 th first frame to the N +1 th tail frame; the similarity determining module is used for determining the image similarity of the Nth tail frame and the (N + 1) th frame; the first judgment module is used for judging whether the image similarity falls into a preset similarity interval or not; the first generation module is used for connecting the Nth tail frame with the (N + 1) th head frame to generate a merged vlog video; the second judgment module is used for judging whether the image similarity is smaller than the minimum value of the similarity interval or not; the second generation module is used for generating a transition frame according to the Nth tail frame and the (N + 1) th first frame, and connecting the Nth tail frame and the (N + 1) th first frame by using the transition frame to generate a merged nlog video; and the reselection module is used for deleting the (N + 1) th frame and setting the video frame after the (N + 1) th frame as the (N + 1) th frame.

In the present invention, preferably, the second generating module includes: a bidirectional motion vector determination submodule for determining a bidirectional motion vector using the nth end frame and the (N + 1) th start frame; and the construction submodule is used for interpolating and constructing the transition frame according to the bidirectional motion vector.

In the present invention, preferably, the bidirectional motion vector determination submodule includes: an initial motion vector determining unit, configured to determine an initial motion vector from the nth last frame to the (N + 1) th first frame by a forward motion vector estimation method; a bidirectional motion vector determination unit for determining a bidirectional motion vector by a bidirectional motion vector estimation method using the initial motion vector; the construction submodule includes: the detection and correction unit is used for detecting whether an abnormal part exists in the bidirectional motion vector or not, and if so, correcting the abnormal part; and the interpolation calculation unit is used for carrying out interpolation calculation by an overlapped block motion compensation technology according to the detected and corrected bidirectional motion vector to obtain a transition frame.

In the present invention, preferably, the similarity determining module includes: the conversion submodule is used for generating a gray level image of the Nth tail frame and the (N + 1) th first frame; and the similarity calculation operator module is used for calculating the similarity between the gray level image of the Nth tail frame and the gray level image of the (N + 1) th head frame, namely the image similarity.

Compared with the prior art, the invention has the beneficial effects that:

the invention discloses a vlog clipping method and a clipping system, which are used for calculating the image similarity of two video frames before and after a connection point of two video segments to be connected, and then determining a video merging strategy according to the position relation between the image similarity and a preset similarity interval: if the image similarity falls into a preset similarity interval, directly connecting the two video clips; if the image similarity is smaller than the minimum value of the preset similarity interval, generating a transition frame according to two video frames before and after the connection point, and connecting the two video clips by using the transition frame; and if the image similarity is larger than the maximum value of the preset similarity interval, two video frames before and after the connection point are too similar, and the next video frame is reselected as the video frame after the connection point to carry out similarity calculation, connection and other work. Through the distinguishing processing process, the generated complete vlog has no obvious editing trace at the connecting point, so that the picture of the vlog video can be more coherent, the video playing is smoother, and the watching experience is improved; the transition frame is constructed by adopting the bidirectional motion vector, and the bidirectional motion vector is corrected to a certain extent, so that the image quality of the constructed transition frame is higher, and the transition frame is more smoothly and naturally linked with the video frames before and after the clipping point.

Drawings

Fig. 1 is a flow chart of the vlog clipping method of the present invention.

Fig. 2 is a flowchart of S6 in the vlog clipping method of the present invention.

Fig. 3 is a flowchart of S601 in the vlog clipping method of the present invention.

Fig. 4 is a flowchart of S602 in the vlog clipping method of the present invention.

Fig. 5 is a flowchart of S2 in the vlog clipping method of the present invention.

Fig. 6 is a schematic diagram of the structure of the vlog clipping system of the present invention.

Fig. 7 is a schematic structural diagram of a second generation module in the vlog clipping system of the present invention.

Fig. 8 is a schematic structural diagram of a bidirectional motion vector determination sub-module and a construction sub-module in the vlog clipping system of the present invention.

Fig. 9 is a schematic structural diagram of a similarity determination module in the vlog clipping system of the present invention.

In the drawings: 1-selecting and extracting module, 2-similarity determining module, 21-converting sub-module, 22-similarity operator module, 3-first judging module, 4-first generating module, 5-second judging module, 6-second generating module, 61-bidirectional motion vector determining sub-module, 611-initial motion vector determining unit, 612-bidirectional motion vector determining unit, 62-constructing sub-module, 621-detecting and correcting unit, 622-interpolation calculating unit and 7-reselecting module.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that when an element is referred to as being "secured to" another element, it can be directly on the other element or intervening elements may also be present. When a component is referred to as being "connected" to another component, it can be directly connected to the other component or intervening components may also be present. When a component is referred to as being "disposed on" another component, it can be directly on the other component or intervening components may also be present. The terms "vertical," "horizontal," "left," "right," and the like as used herein are for illustrative purposes only.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

Referring to fig. 1 to 5, a preferred embodiment of the present invention provides a vlog clipping method, including:

s1: the method comprises the steps of selecting an Nth first frame, an Nth tail frame, an (N + 1) th first frame and an (N + 1) th tail frame in a vlog video according to input of a user, and extracting a plurality of video frames from the Nth first frame to the Nth tail frame and a plurality of video frames from the (N + 1) th first frame to the (N + 1) th tail frame.

The vlog video can be in various existing video formats and is imported into video processing software (such as a video clipping tool, a player and the like), the vlog video is decomposed into a sequence of a plurality of video frames, each video frame can display a thumbnail picture and a normal picture, a user confirms the content of each video frame by observing the thumbnail picture and the normal picture of each video frame, and then a video clip to be intercepted and merged is selected by inputting specific content into the video processing software.

And setting the intercepted video clips as N and N +1, and connecting the tail part of the video clip N with the head part of the video clip N +1 to synthesize the final desired vlog video. The first video frame of the video clip N is the Nth initial frame, the last video frame is the Nth end frame, the first video frame of the video clip N +1 is the (N + 1) th initial frame, and the last video frame is the (N + 1) th end frame. The user selects the Nth first frame, the Nth tail frame, the (N + 1) th first frame and the (N + 1) th tail frame by inputting the specific content of the characteristics, so that the intercepted video can be determined to be a sequence formed by a plurality of video frames from the Nth first frame to the Nth tail frame and a sequence formed by a plurality of video frames from the (N + 1) th first frame to the (N + 1) th tail frame, namely video clips N and N + 1. The extraction process of the video clips N and N +1 is to copy a sequence formed by a plurality of video frames from the Nth first frame to the Nth tail frame and a sequence formed by a plurality of video frames from the (N + 1) th first frame to the (N + 1) th tail frame for subsequent processing.

S2: and determining the image similarity of the Nth tail frame and the (N + 1) th head frame.

When editing, the relation between the (N + 1) th first frame and the (N) th last frame selected by the user may be a coherent picture or a discontinuous picture. When the pictures are not connected, the (N + 1) th frame and the (N) th frame are directly connected, which does not belong to the scene used by the method. When the images of the (N + 1) th frame and the (N) th frame are consecutive, the (N + 1) th frame and the (N) th frame have higher image similarity, and because the (N + 1) th frame and the (N) th frame are not continuously shot images, the difference between the image similarity of the (N + 1) th frame and the (N) th frame and the image similarity of two actual continuous video frames is larger, the (N + 1) th frame and the (N) th frame can be compared, the image similarity of the two frames can be determined, whether the (N + 1) th frame and the (N) th frame have better continuity or not is considered through the index, and therefore subsequent measures are determined. The similarity can be expressed by means of gray variance, mutual information, correlation coefficient, joint entropy and the like.

Specifically, as shown in fig. 5, S2 includes:

s201: and generating gray level images of the Nth tail frame and the (N + 1) th first frame.

The video frames are usually RGB three-channel color images, which are relatively complex and inconvenient to process, so that the grayscale images of the nth end frame and the (N + 1) th first frame are generated. The algorithm for converting the RGB three-channel color image into the Gray map may be a floating point algorithm (Gray ═ R0.3+ G0.59+ B0.11), an integer method (Gray ═ R30+ G59+ B11)/100), an average value method (Gray ═ R + G + B)/3), or the like.

S202: and calculating the similarity between the gray level image of the Nth tail frame and the gray level image of the (N + 1) th head frame, namely the image similarity.

After the conversion into the gray scale image, the similarity between the gray scale image of the nth end frame and the gray scale image of the (N + 1) th initial frame is the image similarity between the nth end frame and the (N + 1) th initial frame.

In various expression modes of the similarity, the mutual information has higher accuracy and robustness, so the embodiment preferably uses the mutual information to express the similarity between the gray-scale image of the nth tail frame and the gray-scale image of the (N + 1) th head frame. Mutual information may reflect statistical dependencies between two systems, indicating how much information one system contains the other.

Mutual information between two random variables a and B can be expressed as:

in the formula, p_AB(a, B) is the joint probability density distribution of random variables A and B, p_A(a) And p_B(b) The edge probability density distribution of the random variables a and B, respectively.

Mutual information can also be described by information entropy, and according to the definition of the information entropy, there are:

wherein, H (A) and H (B) are information entropies of random variables A and B, respectively, H (A, B) is joint entropy of A and B, and H (A | B) and H (B | A) are conditional entropy of A under the condition of known B and conditional entropy of B under the condition of known A, respectively. The entropy of information expresses a measure of the average uncertainty or information content of the system, namely: h (A) represents the mean uncertainty of the random variable A; h (A | B) represents the average uncertainty that A still exists after B is known; the difference between H (a) and H (a | B) indicates the amount of decrease in uncertainty of a after B is known, that is, the amount of information of a contained in B. Therefore, the mutual information between two random variables a and B can be expressed as:

I(A,B)＝H(A)-H(A|B)＝H(B)-H(B|A)＝H(A)+H(B)-H(A,B) (5)

if A, B are independent of each other, p_AB(a,b)＝p_A(a)p_B(b) When I (a, B) ═ 0; if A, B is completely dependent, p_AB(a,b)＝p_A(a)＝p_B(b) That is, when H (a, B) ═ H (a) ═ H (B), I (a, B) is the largest.

The nth last frame and the (N + 1) th first frame can be considered as two random variables a and B of them with respect to the image gray level, which can be obtained by normalizing the joint gray level histogram:

where i and j are the gray values of the pixels in images a and B, h (i, j) is the logarithm of the pixels with gray value (i, j) in the overlapping portion of the two images, and Σ h (i, j) is the total number of pixels in the overlapping portion of the two images, whereby the edge probability distribution can be expressed as:

the mutual information I (A, B) of A and B can be calculated by substituting the formulas (6), (7) and (8) into the formula (1).

Since the mutual information represented by I (a, B) is not intuitive enough to represent the similarity, a relative value S (a, B) can be used to represent the similarity of B with respect to a:

the value of S (A, B) is between 0 and 1, in the invention, A is the Nth tail frame, B is the (N + 1) th frame, and S (A, B) should be approximately between 0.5 and 1 because the Nth tail frame and the (N + 1) th frame have higher similarity.

S3: and judging whether the image similarity falls into a preset similarity interval, if so, executing S4, and if not, executing S5.

The preset similarity interval is a range of possible values of similarity between two normal consecutive video frames, and is usually between 0.75 and 0.95. As described above, if S (a, B) falls into the preset similarity interval, it indicates that the image similarity between the nth end frame and the (N + 1) th frame is not different from that between two normal consecutive video frames, and after the two frames are connected, the situation of discontinuous pictures or slow rhythm does not occur, and no obvious editing trace exists, so S4 can be directly executed.

S4: and connecting the Nth tail frame with the (N + 1) th frame to generate a merged vlog video.

At this time, all video frames before the nth end frame until the nth head frame are connected with all video frames after the (N + 1) th head frame until the (N + 1) th end frame, and the merged vlog video can be generated through subsequent video compression and other operations.

S5: and judging whether the image similarity is smaller than the minimum value of the preset similarity interval, if so, executing S6, and otherwise, executing S7.

The image similarity does not fall into the preset similarity interval and can be divided into two cases: the image similarity is smaller than the minimum value of the preset similarity interval or the image similarity is larger than the maximum value of the preset similarity interval.

When the image similarity is smaller than the minimum value of the preset similarity interval, it is described that the image similarity between the nth last frame and the (N + 1) th frame is smaller than the similarity of the normal continuous frames, the difference between the nth last frame and the (N + 1) th frame is large, and the situation of video incoherence easily occurs in direct connection, so that S6 needs to be executed to generate a transition frame to fill the space between the nth last frame and the (N + 1) th frame, and the nth last frame and the (N + 1) th frame are connected to enable the merged video to be smooth.

S6: and generating a transition frame according to the Nth tail frame and the (N + 1) th first frame, and connecting the Nth tail frame and the (N + 1) th first frame by using the transition frame to generate a merged vlog video.

From the nth last frame and the (N + 1) th first frame, a transition frame may be generated by a motion compensation interpolation frame method. The basic idea is as follows: and establishing a motion model according to the Nth tail frame and the (N + 1) th frame, defining the change trend from the Nth tail frame to the (N + 1) th frame as a motion vector, estimating the motion vector according to the difference between the motion vector and the motion vector, and constructing a transition frame on the basis of the Nth tail frame and the (N + 1) th frame according to the motion vector. The motion vector estimation can be performed by methods based on an optical flow equation (an optical flow field is obtained by estimating the gray value gradient of a space-time image and appropriate space-time smoothing conditions are used in combination), a block motion model (an image is divided into blocks with a certain size, pixels in a unified block are assumed to have the same motion vector, a best matching block in a certain size range of an (N + 1) th frame and/or an nth tail frame is searched according to a current block, a motion vector is obtained according to previous relative displacement), a pixel recursive method (a pixel prediction value is a motion vector estimation value at a corresponding pixel position of the nth tail frame or a linear combination of pixel points in a neighborhood of the current pixel, and a pixel prediction model is further modified according to a principle that the difference between the pixel prediction value and an actual value is small), and the like.

And after the transition frame is obtained, filling the transition frame between the Nth tail frame and the (N + 1) th frame, connecting all video frames before the Nth tail frame and till the Nth head frame with all video frames after the (N + 1) th frame and till the (N + 1) th tail frame through the transition frame, and generating a whole section of complete merged vlog video required by a producer through subsequent video compression and other operations.

S7: deleting the N +1 th first frame, setting the first video frame after the N +1 th first frame as a new N +1 th first frame or deleting the nth last frame, setting the first video frame before the nth last frame as a new nth last frame, and then re-executing S2.

When the image similarity of the Nth tail frame and the (N + 1) th frame is larger than the maximum value of the preset similarity interval, the similarity of the (N + 1) th frame and the (N + 1) th tail frame is extremely high, and the video formed by directly connecting the (N + 1) th frame and the (N + 1) th frame shows the effect of picture stagnation, so that the rhythm of the video at the position is slowed down, and the editing trace is obvious. Thus, one of the two video frames may be discarded and the connection of the sequence of video frames reorganized. Taking the case of discarding the (N + 1) th first frame, deleting the (N + 1) th first frame, in the selected video clip (N + 1), taking the first video frame after the (N + 1) th first frame as a new (N + 1) th first frame, then starting from S2, re-executing S2 and the following steps, judging whether the (N) th last frame and the (N + 1) th first frame meet the requirement of similarity, and if so, directly connecting to generate a merged vlog video; if the similarity is too small, generating a transition frame, and connecting the Nth tail frame with the (N + 1) th head frame by using the transition frame to generate a merged vlog video; and if the similarity is too large, continuously abandoning the (N + 1) th frame, selecting a new (N + 1) th frame, and repeating the process until a merged vlog video without obvious editing traces can be generated.

In the present embodiment, as shown in fig. 2, S6 preferably includes:

s601: and determining a bidirectional motion vector by using the Nth tail frame and the (N + 1) th head frame.

S602: and (4) interpolating and constructing a transition frame according to the bidirectional motion vector.

The basic principle of estimating motion vector by using block matching motion model is to divide each frame of video sequence into several non-overlapping blocks, consider the displacement of all pixels in the block to be the same, then find out the block most similar to the current block, i.e. matching block, according to a certain block matching criterion in the given search range from each block to reference frame, and the relative displacement between the matching block and the current block is the motion vector. The block matching criterion may adopt SAD (absolute error criterion), and the calculation formula is:

wherein the size of the matched block is M × M, (d)_x,d_y) Representing candidate motion vectors, F_t(m, n) is the gray value at the pixel point (m, n) of the current frame, F_tRepresenting the current frame, F_t-1The reference frame is shown, and in the present embodiment, the nth end frame or the (N + 1) th start frame may be used as the reference frame.

To avoid the problem of holes and overlaps, the motion vector used in the present embodiment is a bidirectional motion vector. The method divides a transition frame into a plurality of blocks, each block has two motion vectors, one of the motion vectors points to an Nth tail frame, the other points to an N +1 th head frame, and pixels of the blocks in the transition frame are obtained by interpolation of the two motion vectors. The motion compensation frame interpolation method based on the bidirectional motion vector estimation carries out interpolation recovery, and the calculation formula is as follows:

wherein v is_x、v_yFor the previous frame F estimated from the bi-directional motion vector_t-1And the following frame F_t+1The motion vectors in x and y directions between the first frame and the second frame, the Nth end frame and the (N + 1) th first frame can be respectively used as F_t-1And F_t+1。

In the method, for a certain block of a transition frame, by comparing a block at a certain displacement in the Nth tail frame with a block at an opposite displacement in the N +1 th first frame, the displacement which is most matched with the two blocks in the Nth tail frame and the N +1 th first frame is found out to be used as the motion vector of the block in the transition frameAmount of the compound (A). With B_iRepresents a block in the transition frame, assuming a motion vector search range of [ -a, a [ ]]. For each candidate motion vector in the search range

Calculating the corresponding position shift of the block in the Nth tail frame

Corresponding position shift in the N +1 th frame

The sum of the absolute difference values of each pixel in the corresponding block of the Nth tail frame and the (N + 1) th head frame is as follows:

wherein

Pixel point locations in the block. In [ -a, a [ ]]Searching in to the smallest sum of absolute differences

I.e. the motion vector estimate for the block in the transition frame.

In the present embodiment, as shown in fig. 3, S601 preferably includes:

s6011: and determining an initial motion vector from the Nth tail frame to the (N + 1) th head frame by a forward motion vector estimation method.

Dividing the Nth tail frame into M multiplied by M blocks, selecting blocks at corresponding positions of the (N + 1) th first frame and a certain number of pixel points at the upper, lower, left and right sides as search ranges, using SAD as a criterion, finding out a matching block and a block with the minimum SAD value from the search range from each block of the Nth tail frame to the (N + 1) th first frame, and then calculating the relative displacement of the current block and the matching block to obtain the forward motion vector of the Nth tail frame

Dividing by 2 to obtain the initial motion vector of the transition frame between the Nth end frame and the (N + 1) th first frame, and recording as

The initial forward motion vector is

The backward motion vector is

S6012: and determining a bidirectional motion vector by a bidirectional motion vector estimation method by using the initial motion vector.

After determining the initial bidirectional motion vector and performing the first motion vector adjustment

Then, using the initial motion vector as an initial motion vector, and using a bidirectional motion vector estimation method to adjust the initial motion vector in a small search range, the basic process is as follows: block B in transition frame_iOne pixel point of is

Based on the corrected initial motion vector

Respectively finding out block B in Nth tail frame and N +1 th head frame_iCorresponding matching block B₁And B₂. Block B₁And B₂The range of the surrounding 1-3 layers of pixel points is used as a search range. Pixel point

At block B₁And B₂Is a point

And

respectively by means of initial motion vectors

The calculation is as follows:

assuming that the motion of the object in the image is uniform, the two matching blocks B₁And B₂Is synchronous symmetrical, i.e. if block B is a block B₁When moving to the left, block B corresponding to threshold₂The corresponding number of steps will be moved to the right. For the search range [ -2,2]Each candidate motion vector in

According to the idea of the bidirectional motion vector estimation method, calculating the sum of absolute differences of each pixel in corresponding blocks in the Nth tail frame and the (N + 1) th head frame, wherein the formula is as follows:

then finding out the condition of minimum SAD, i.e. the best matching condition, and adjusting the initial motion vector to obtain the bidirectional motion vector

In the present embodiment, as shown in fig. 4, S602 preferably includes:

s6021: and detecting whether an abnormal part exists in the bidirectional motion vector, and if so, correcting the abnormal part.

In some cases, an object exists only in the nth end frame or the (N + 1) th first frame, which usually occurs when an old object disappears or a new object appears. In this case, the two frames do not need to be weighted to recover the transition frame, and only the frame where the object exists needs to be obtained according to the corresponding motion vector. To accommodate this, the above-mentioned bidirectional motion vector needs to be corrected.

In the image disappearance or appearance region, the residual value SAD of the corresponding matching block between two frames is large because it exists only in one frame and does not exist in the other frame. According to this feature, an abnormal portion in which an isolated motion vector (which is considered as an isolated motion vector if the motion vector of a certain block and the motion vector of an adjacent block do not coincide) can be detected based on the correlation between the vectors, and then the isolated motion vector can be corrected.

Detecting an isolated motion vector, firstly, respectively calculating absolute difference values of the motion vector of a block to be detected and the motion vectors of eight blocks adjacent to the block to be detected, wherein the formula is as follows:

d_x,i＝|v_x-v_x,i|,d_y,i＝|v_y-v_y,j| (16)

when an isolated motion vector is detected, the motion vector sizes of eight blocks around the current block are sorted to find two intermediate values, and then the average value of the two intermediate values is used to replace the original motion vector.

Then, the motion vector may be further smoothed: for block B in transition frame and several adjacent blocks N around it_i，

And

corresponding bi-directional motion vector (SADB), by which

And obtaining the pixel value difference of the corresponding block of the current block B in the Nth tail frame and the (N + 1) th first frame. Formula for calculation such asThe following:

the block B is then adjusted for bi-directional motion vectors

Can be obtained by the following formula:

s6022: and carrying out interpolation calculation by an overlapped block motion compensation technology according to the detected and corrected bidirectional motion vector to obtain a transition frame.

Given an M × M sized block B, a small overlap width w, the original block size is expanded to (N +2w) × (N +2 w). Eight blocks N adjacent to each other_iExpanding to the same size, i 1,2., 8, three different regions R1 (no overlap), R2 (two overlap) and R3 (four overlap) result. Let

Representing the use of bi-directional motion vectors according to equation (11)

Obtained

Motion compensated values of the points. The OBMC (overlapped block motion compensation) result is then obtained for different regions in block B using the following series of block overlap strategies:

in the region R1, the position of the first electrode,

in the region R2, the position of the first electrode,

in the region R3, the position of the first electrode,

wherein S_kThe definition is as follows:

through the calculation process, the transition frame can be obtained.

Referring to fig. 6 to 9, a preferred embodiment of the present invention further provides a vlog clipping system, which includes: the selecting and extracting module is used for determining an Nth first frame, an Nth tail frame, an N +1 th first frame and an N +1 th tail frame in the vlog video according to the input of a user, and extracting a plurality of video frames from the Nth first frame to the Nth tail frame and a plurality of video frames from the N +1 th first frame to the N +1 th tail frame; the similarity determining module is used for determining the image similarity of the Nth tail frame and the (N + 1) th frame; the first judgment module is used for judging whether the image similarity falls into a preset similarity interval or not; the first generation module is used for connecting the Nth tail frame with the (N + 1) th head frame to generate a merged vlog video; the second judgment module is used for judging whether the image similarity is smaller than the minimum value of the similarity interval or not; the second generation module is used for generating a transition frame according to the Nth tail frame and the (N + 1) th first frame, and connecting the Nth tail frame and the (N + 1) th first frame by using the transition frame to generate a merged nlog video; and the reselection module is used for deleting the (N + 1) th frame and setting the video frame after the (N + 1) th frame as the (N + 1) th frame.

As shown in fig. 7, the second generation module includes: a bidirectional motion vector determination submodule for determining a bidirectional motion vector using the nth end frame and the (N + 1) th start frame; and the construction submodule is used for interpolating and constructing the transition frame according to the bidirectional motion vector.

The bidirectional motion vector determination sub-module includes: an initial motion vector determining unit, configured to determine an initial motion vector from the nth last frame to the (N + 1) th first frame by a forward motion vector estimation method; a bidirectional motion vector determination unit for determining a bidirectional motion vector by a bidirectional motion vector estimation method using the initial motion vector, as shown in fig. 8.

The construction submodule includes: the detection and correction unit is used for detecting whether an abnormal part exists in the bidirectional motion vector or not, and if so, correcting the abnormal part; and an interpolation calculation unit, configured to perform interpolation calculation by using an overlapped block motion compensation technique according to the detected and corrected bidirectional motion vector, so as to obtain a transition frame, as shown in fig. 8.

The similarity determination module comprises: the conversion submodule is used for generating a gray level image of the Nth tail frame and the (N + 1) th first frame; and the similarity calculation operator module is used for calculating the similarity between the gray level image of the nth tail frame and the gray level image of the (N + 1) th head frame, namely the image similarity, as shown in fig. 9.

The modules and units are all modularized functional entities, and are realized by a computer device. The computer device comprises a processor, a memory, an input and output device and a bus; the bus is respectively connected with the processor, the memory and the input and output equipment; the processor is configured to perform the various functions described above. The number of processors and memories may be one or more, among others. The memory can be volatile memory or persistent memory.

The above description is intended to describe in detail the preferred embodiments of the present invention, but the embodiments are not intended to limit the scope of the claims of the present invention, and all equivalent changes and modifications made within the technical spirit of the present invention should fall within the scope of the claims of the present invention.

Claims

1. A vlog clipping method, comprising:

s1: selecting an Nth first frame, an Nth tail frame, an N +1 th first frame and an N +1 th tail frame in a vlog video according to input of a user, and extracting a plurality of video frames from the Nth first frame to the Nth tail frame and a plurality of video frames from the N +1 th first frame to the N +1 th tail frame;

s2: determining the image similarity of the Nth tail frame and the (N + 1) th frame;

s3: judging whether the image similarity falls into a preset similarity interval, if so, executing S4, and if not, executing S5;

s4: connecting the Nth tail frame with the (N + 1) th frame to generate a merged vlog video;

s5: judging whether the image similarity is smaller than the minimum value of a preset similarity interval, if so, executing S6, otherwise, executing S7;

s6: generating a transition frame according to the Nth tail frame and the (N + 1) th first frame, and connecting the Nth tail frame and the (N + 1) th first frame by using the transition frame to generate a merged vlog video;

2. The vlog clipping method according to claim 1, wherein the generating the transition frame according to the nth tail frame and the N +1 th head frame in S6 comprises:

s601: determining a bidirectional motion vector by using the Nth tail frame and the (N + 1) th first frame;

3. The vlog clipping method according to claim 2, wherein the S601 comprises:

s6011: determining an initial motion vector from the Nth tail frame to the (N + 1) th frame by a forward motion vector estimation method;

4. The vlog clipping method according to claim 3, wherein the S602 comprises:

s6021: detecting whether an abnormal part exists in the bidirectional motion vector, and if so, correcting the abnormal part;

5. The vlog clipping method according to claim 1, wherein the S2 comprises:

s201: generating gray-scale images of the Nth tail frame and the (N + 1) th first frame;

6. The vlog clipping method according to claim 5, wherein the image similarity is derived by using mutual information of the gray scale map of the Nth end frame and the gray scale map of the (N + 1) th first frame.

7. A vlog clipping system, comprising:

the selecting and extracting module is used for determining an Nth first frame, an Nth tail frame, an N +1 th first frame and an N +1 th tail frame in the vlog video according to the input of a user, and extracting a plurality of video frames from the Nth first frame to the Nth tail frame and a plurality of video frames from the N +1 th first frame to the N +1 th tail frame;

the similarity determining module is used for determining the image similarity of the Nth tail frame and the (N + 1) th frame;

the first judgment module is used for judging whether the image similarity falls into a preset similarity interval or not;

the first generation module is used for connecting the Nth tail frame with the (N + 1) th head frame to generate a merged vlog video;

the second judgment module is used for judging whether the image similarity is smaller than the minimum value of the similarity interval or not;

the second generation module is used for generating a transition frame according to the Nth tail frame and the (N + 1) th first frame, and connecting the Nth tail frame and the (N + 1) th first frame by using the transition frame to generate a merged nlog video;

and the reselection module is used for deleting the (N + 1) th frame and setting the video frame after the (N + 1) th frame as the (N + 1) th frame.

8. The vlog clipping system of claim 7, wherein the second generating module comprises:

a bidirectional motion vector determination submodule for determining a bidirectional motion vector using the nth end frame and the (N + 1) th start frame;

and the construction submodule is used for interpolating and constructing the transition frame according to the bidirectional motion vector.

9. The vlog clipping system of claim 8, wherein the bi-directional motion vector determination sub-module comprises:

an initial motion vector determining unit, configured to determine an initial motion vector from the nth last frame to the (N + 1) th first frame by a forward motion vector estimation method;

a bidirectional motion vector determination unit for determining a bidirectional motion vector by a bidirectional motion vector estimation method using the initial motion vector;

the construction submodule includes:

the detection and correction unit is used for detecting whether an abnormal part exists in the bidirectional motion vector or not, and if so, correcting the abnormal part;

and the interpolation calculation unit is used for carrying out interpolation calculation by an overlapped block motion compensation technology according to the detected and corrected bidirectional motion vector to obtain a transition frame.

10. The vlog clipping method according to claim 7, wherein the similarity determination module comprises:

the conversion submodule is used for generating a gray level image of the Nth tail frame and the (N + 1) th first frame;

and the similarity calculation operator module is used for calculating the similarity between the gray level image of the Nth tail frame and the gray level image of the (N + 1) th head frame, namely the image similarity.