CN111640187B

CN111640187B - Video stitching method and system based on interpolation transition

Info

Publication number: CN111640187B
Application number: CN202010310346.7A
Authority: CN
Inventors: 邢云冰; 陈益强; 戴连君; 张钧
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2020-04-20
Filing date: 2020-04-20
Publication date: 2023-05-02
Anticipated expiration: 2040-04-20
Also published as: CN111640187A

Abstract

The invention provides a video stitching method and a system based on interpolation transition, comprising the following steps: unifying the object sizes before and after the video, searching the position of the optimal splicing point, unifying the illumination brightness before and after the video and the object position, calculating the number of interpolation transition images and generating an interpolation transition image sequence. The technical scheme provided by the invention has the technical effects of smooth and fluent video transition, high speed and strong instantaneity.

Description

Video stitching method and system based on interpolation transition

Technical Field

The invention relates to the technical field of computer vision and video processing, in particular to a method and a system for splicing videos by utilizing interpolation transition frames.

Background

Video stitching is an important item of video processing. In video communication, when the network bandwidth fluctuates, the variable code rate video communication system can dynamically adjust the code rate in a scalable coding mode (for example, adjusting resolution, frame rate or image quality), but for a fixed code rate video communication system, a receiving end can generate obvious blocking phenomenon, and the video communication system is particularly characterized by video time-out. For the first case, the received image sequence can be uniformly interpolated to improve the frame rate and smoothness of the video, and for the second case, the pause image sequence segments need to be supplemented to maintain the duration and continuity of the video. The uniform interpolation of the first case can be regarded as a special case of the second case.

Unlike image stitching at a spatial scale, video stitching is a process at a temporal scale that improves the smoothness of video playback by generating interpolated transition images between adjacent videos. Video stitching is also widely used in video synthesis and video editing, for example, in sign language video synthesis, each sign language word corresponds to a video segment, if video segments corresponding to different sign language words are directly connected, the connected sign language videos may have problems of hand movement and angle transfer deviation, and adjacent videos may generate obvious visual abrupt switching phenomenon during playing, so that interpolation transition between adjacent videos is required.

Currently, there are two main methods for splicing video clips: traditional optical flow-based methods and deep learning-based methods.

In the optical flow-based method, the optical flow of each pixel point in the image is calculated, the optical flow can represent the motion change information of the object in space, and the expression form of the optical flow of the interpolation image can be obtained through the linear relation between the optical flow of the interpolation image and the optical flow of the adjacent image so as to generate the interpolation transition image.

In the method based on deep learning, a mode of training and predicting is generally adopted, a generating model of an interpolation image is obtained through known continuous image data in a training stage, and an interpolation image with a transitional effect is finally generated by calculating an input model of an adjacent image in a predicting stage. In order to achieve a more natural and smooth playing effect, the spliced video can be subjected to re-smooth transition treatment.

In the optical flow based method, the quality of the interpolated image is very sensitive to the accuracy of the optical flow, and if the optical flow of a certain pixel point is wrong, all areas between the correct position and the wrong position in the interpolated image can be significantly affected. On the one hand, due to the complexity of searching, optical flow algorithms generally detect motion only in a small range, but the motion amplitude between non-continuous video segments (e.g., two different sign language videos) is generally large, so that it is difficult to accurately detect optical flow between adjacent images. On the other hand, the optical flow algorithm can accurately track pixel points (such as corner points) with obvious features, but is relatively easy to lose for objects with less textures (such as hands).

In the deep learning-based method, a large amount of data is generally required for training a model, and obtaining training data is a time-consuming and labor-consuming task, for example, in sign language video synthesis, continuous videos expressing a plurality of sign language words do not exist. Furthermore, the time complexity of deep learning is generally high, i.e. not suitable for real-time stitching situations.

The video image may be divided into a foreground and a background, the foreground being composed of one or more target objects, assuming P is used ⁽ⁱ⁾ Representing the ith target object, then an image containing L target objects

Wherein P is ⁽⁰⁾ Representing the background. Each target object comprises a plurality of key nodes, and part of the key nodes belong to anchor nodes. The key nodes refer to nodes capable of acquiring corresponding relations between adjacent images, and the anchor nodes are nodes with unchanged relative positions in the adjacent images. The anchor node only moves in a translation way, and the movement of the non-anchor key node consists of two parts of translation and rotation, and the related key node is required to be used as a fulcrum during rotation, namely the non-anchor key node rotates around the related key node.

Disclosure of Invention

In view of the above problems, the present invention provides a video stitching method based on interpolation transition, which includes:

step 1, acquiring adjacent video clips to be spliced, which comprise a first video and a tail video, and adjusting the sizes of target objects of all images in the tail video according to the average distance value of anchor nodes of each target object in the first video;

step 2, taking the average position value of the anchor node of each target object of each frame of image in the adjacent video segment as the origin of coordinates, obtaining the position of the non-anchor key node of the same target object in each frame of image and the direction between the key nodes, obtaining the splicing cost of the front and rear images of the first video and the tail video according to the position of the image in the video segment, and taking the image pair with the minimum splicing cost as the optimal splicing point image of the first video and the tail video respectively;

step 3, adjusting the brightness of the target objects of all images in the tail video by taking the average brightness value of each target object in the head video as a reference, calculating the average position value of the anchor node of each target object of the best splicing point image in the head video, and adjusting the positions of the target objects of all images in the tail video by taking the position value as a reference;

step 4, calculating the motion trail and the speed of all non-anchor key nodes of all target objects between two optimal splicing point images, and obtaining the number of interpolation transition images through the motion trail and the speed;

step 5, according to the position values of all key nodes of all target objects of two optimal splicing point images, interpolating the number of transitional images and the motion track of the key nodes to obtain the positions of the key nodes in each frame of interpolation transitional images, triangulating the boundary nodes of the key nodes and the images in each frame of interpolation transitional images to form triangular grids, respectively taking the vertexes of each triangle as target vertexes, respectively taking the corresponding nodes of two optimal splicing point images as source vertexes, calculating affine transformation of each pair of triangles to obtain transformation matrixes, respectively mapping triangular images formed by the source vertexes to triangular areas formed by the target vertexes, calculating transformation matrixes of all triangles in the triangular grids to form two mapped image areas, respectively corresponding to the two optimal splicing point images;

and 6, classifying the areas of the interpolation transition images according to the environment types of the areas in the two optimal splicing point images, respectively mixing the two optimal splicing point images according to the types of the areas in the interpolation transition images and the preset proportion to obtain an interpolation transition image, and splicing an interpolation transition image sequence containing the interpolation transition image between the head video and the tail video to obtain a final spliced video.

In the video stitching method based on interpolation transition, the classifying the region of the interpolation transition image in the step 6 specifically includes:

the regions of the interpolated transition image are divided into three classes: the first type of region is that the corresponding region in the two best splice point images is foreground or background, the second type of region is that the corresponding region of the previous best splice point image is foreground and the corresponding region of the next best splice point image is background, and the third type of region is that the corresponding region of the previous best splice point image is background and the corresponding region of the next best splice point image is foreground.

The video stitching method based on interpolation transition, wherein the step 6 comprises the following steps:

according to the position k of the interpolation transition image in the interpolation transition image sequence, the previous best splicing point image P is arranged in the first type area _T And the last best splice point image Q _S Is scaled to the pixels of (2)

And->

Mixing, in the second mapping region, P _T Is proportional to +.>

Reduction, Q _S Is proportional to +.>

Increasing, in the third mapping region, Q _S Is proportional to +.>

Increase, P _T Is proportional to +.>

Reduction;

wherein, k is more than or equal to 1 and less than or equal to L, and L is the number of frames of the interpolation transition image sequence.

The video stitching method based on interpolation transition, wherein the stitching cost of the front and rear images of the first video and the tail video obtained in the step 2 specifically includes:

where alpha and beta are the balance parameters,

for image Q in tail video _j Is provided for the location of the non-anchor critical node,

for image Q in tail video _j Direction between related critical nodes of (a), a ∈>

Image P in the first video _i Positions of non-anchor critical nodes, +.>

Image P in the first video _i I and j are the positions of the image in the first video and the last video, respectively.

In the video stitching method based on interpolation transition, in the step 4, the motion track is a straight line or a curve.

The invention also provides a video splicing system based on interpolation transition, which comprises the following steps:

the method comprises the steps that a module 1, adjacent video clips to be spliced, comprising a first video and a tail video, are obtained, and the sizes of target objects of all images in the tail video are adjusted according to the average distance value of anchor nodes of each target object in the first video;

the module 2 takes the average position value of the anchor node of each target object of each frame of image in the adjacent video segment as the origin of coordinates to obtain the position of the non-anchor key node of the same target object in each frame of image and the direction between the key nodes, and obtains the splicing cost of the front and rear images of the head video and the tail video according to the position of the image in the video segment, and takes the image pair with the minimum splicing cost as the optimal splicing point image of the head video and the tail video respectively;

the module 3 adjusts the brightness of the target objects of all the images in the tail video by taking the average brightness value of each target object in the head video as a reference, calculates the average position value of the anchor node of each target object of the best splicing point image in the head video, and adjusts the positions of the target objects of all the images in the tail video by taking the position value as a reference;

the module 4 calculates the motion trail and the speed of all non-anchor key nodes of all target objects between two optimal splicing point images, and obtains the number of interpolation transition images through the motion trail and the speed;

the module 5 interpolates the number of transition images and the motion track of the key nodes according to the position values of all key nodes of all target objects of two optimal splicing point images to obtain the positions of the key nodes in each frame of interpolation transition image, triangulates the boundary nodes of the key nodes and the images in each frame of interpolation transition image as a scattered point set to form triangular grids, calculates affine transformation of each pair of triangles by taking the vertexes of each triangle as the destination vertexes and the corresponding nodes of the two optimal splicing point images as the source vertexes to obtain transformation matrixes, maps the triangle images formed by the source vertexes to triangle areas formed by the destination vertexes, calculates transformation matrixes of all triangles in the triangular grids to form two mapped image areas, and corresponds to the two optimal splicing point images respectively;

and a module 6, classifying the areas of the interpolation transition images according to the environment types of the areas in the two optimal splicing point images, respectively mixing the two optimal splicing point images according to the types of the areas in the interpolation transition images and the preset proportion to obtain the interpolation transition images, and splicing the interpolation transition image sequence containing the interpolation transition images between the head video and the tail video to obtain the final spliced video.

The video stitching system based on interpolation transition, wherein the classifying the region of the interpolation transition image in the module 6 specifically includes:

The video stitching system based on interpolation transition, wherein the module 6 comprises:

And->

Mixing, in the second mapping region, P _T Is proportional to +.>

Reduction, Q _S Is proportional to +.>

Increasing, in the third mapping region, Q _S Is proportional to +.>

Increase, P _T Is proportional to +.>

Reduction;

The video stitching system based on interpolation transition, wherein the stitching cost of the front and rear images of the first video and the tail video obtained in the module 2 specifically includes:

where alpha and beta are the balance parameters,

Image P in the first video _i Positions of non-anchor critical nodes, +.>

The video stitching system based on interpolation transition, wherein the motion track in the module 4 is a straight line or a curve.

Compared with the prior art, the technical scheme provided by the invention has the following advantages:

(1) The video transition is smooth and has a gradual transition effect. The invention divides the video image into a background, a rigid foreground and a flexible foreground, different areas have different transition strategies, the pixel values of the overlapped areas are mixed in proportion, the pixels of the non-overlapped areas are increased or decreased in proportion, and the invention accords with the reaction process of human eyes for observing moving objects.

(2) The speed is high, and the instantaneity is high. No search analysis or network recursion is required for the entire frame of images. The method is characterized in that all calculation processing is based on a single pixel, the main performance bottleneck calculates the shortest distance in the image mixing stage, but the distance comparison is only limited in a triangle grid, the calculation complexity is inversely proportional to the number of key nodes, and the result can be obtained in advance by adopting a proper distance measurement and searching mode.

Drawings

FIG. 1 is a diagram of the location and orientation of key nodes of a human body (head-to-tail is a stitching point image);

FIG. 2 is a triangulation (intermediate image of FIG. 1);

FIG. 3 is a triangle area and corresponding stitching point image;

fig. 4 is an interpolated transition image.

Detailed Description

The invention provides a video stitching method based on interpolation transition, which comprises the following steps:

step one, size consistency.

In the adjacent video segments, the average distance value of the anchor node of each target object in the previous video segment (the first video segment) is calculated, and the size of the same target object of all images in the next video segment (the last video segment) is adjusted by taking the distance value as a reference. The sizes of all target objects of all images in the subsequent video segment are adjusted.

And step two, searching the position of the optimal splicing point.

And calculating the average position value of the anchor node of each target object of each frame of image in the adjacent video segment, and recalculating the position of the non-anchor key node of the same target object in the image and the direction between the related key nodes by taking the position value as the origin of coordinates.

In the adjacent video segments, the splicing cost of the front and rear images is calculated, wherein the splicing cost comprises the positions of the images in the video segments, the distances of the corresponding non-anchor key nodes of all target objects and the direction difference between the related key nodes of all target objects. And taking the image pair with the minimum splicing cost as the optimal splicing point image of the adjacent video clips respectively. All images following the best splice point in the previous video and all images preceding the best splice point in the next video are deleted.

And thirdly, illumination consistency and position consistency.

In the adjacent video segments, calculating the average brightness value of each target object in the previous video segment, and adjusting the brightness of the same target object of all images in the next video segment by taking the brightness value as a reference. The brightness of all target objects of all images in the subsequent video is adjusted.

In the adjacent video segments, calculating the average position value of the anchor node of each target object of the splicing point image in the previous video segment, and adjusting the positions of the same target objects of all images in the subsequent video segment by taking the position value as a reference. The positions of all target objects of all images in the subsequent video are adjusted.

And step four, calculating the number of interpolation transition images.

And calculating the longest motion trail of all non-anchor key nodes of all target objects of the splice point images in the adjacent video segments, wherein the longest motion trail can be a straight line or a curve. And respectively calculating the speed of the corresponding non-anchor key nodes of the splicing point image, namely the distance between the splicing point image and the corresponding nodes of the adjacent images. And obtaining the number of interpolation transition images through the motion trail and the speed.

And fifthly, generating an interpolation transition image sequence.

And interpolating the number of the transition images and the motion track of the key nodes according to the position values of all the key nodes of all the target objects of the splicing point images in the adjacent video segments to obtain the positions of the corresponding key nodes in each frame of the interpolation transition images.

In each frame of interpolation transition image, triangulation is carried out by taking key nodes and boundary nodes of the image as a scattered point set, so as to form a triangular grid. And taking the vertex of each triangle as a target vertex, respectively taking the corresponding nodes of the two stitching point images as source vertices, and calculating affine transformation of each pair of triangles. And mapping the triangle image formed by the source vertexes to the triangle area formed by the destination vertexes according to the transformation matrix. And calculating transformation matrixes of all triangles in the triangle mesh to form two mapped image areas, wherein the two mapped image areas correspond to the two splicing point images respectively.

The regions of the interpolated transition image are divided into three classes: the corresponding areas in the two splicing point images are foreground or background, the corresponding area of the previous splicing point image is foreground and the corresponding area of the next splicing point image is background, and the corresponding area of the previous splicing point image is background and the corresponding area of the next splicing point image is foreground. According to the position of the interpolation image in the whole image sequence, in a first type region, the pixels of two splicing point images are mixed proportionally, in a second type region, the foreground pixels of the previous splicing point image are reduced proportionally, the background pixels of the next splicing point image are correspondingly increased, and in a third type region, the foreground pixels of the next splicing point image are increased proportionally, and the background pixels of the previous splicing point image are correspondingly reduced. The order of increasing and decreasing is measured as the shortest distance between the foreground pixel and the foreground of the other splice point image.

In order to make the above features and effects of the present invention more clearly understood, the following specific examples are given with reference to the accompanying drawings.

For ease of understanding, a possible application scenario of the method according to the invention is first given before a detailed description of the method according to the invention is given. In sign language video synthesis, video segments corresponding to different sign language words need to be spliced, and a previous sign language video is formed by N frames of images and is marked as { P } ₁ ,P ₂ …P _N The later sign language video consists of M frames of images, which is marked as { Q } _M ,Q _M-1 …Q ₁ And (3)

Wherein->

And

respectively represent the images P _i And Q _j Is of a single color, ++>

And->

Representing the human body and being the only target object in the foreground. In sign language movements, the main moving objects are hands and arms, and the human torso can be considered to be approximately rigid. Thus, skeletal nodes of the human body are key nodes such as left and right hands, left and right elbows, left and right five fingers, etc., wherein the neck, left and right shoulders and left and right hips are anchor nodes whose coordinate values are expressed as (X) ^joint ,Y ^joint ) For example, the coordinate value of the left hand is (X ^LHand ,Y ^LHand )。

For the above application scenario, an embodiment of the interpolation transition based video stitching of the present invention is given below. The basic steps are as follows:

step one, size consistency.

Respectively calculating the previous video { P } ₁ ,P ₂ …P _N Sum-next video { Q } _M ,Q _M-1 …Q ₁ Average distance value (W of human anchor node in } _P ,H _P ) And (W) _Q ,H _Q )。

Respectively in W _P /W _Q And H _P /H _Q To scale { Q } _M ,Q _M-1 …Q ₁ Width and height.

Step two, searching the position of the optimal splicing point

Calculating the previous video { P } ₁ ,P ₂ …P _N Every frame of picture P _i Average position value of human body anchor node

/>

To be used for

Recalculating the image P for the origin of coordinates _i Is not the location of the anchor critical node of the human body

And the direction between the relevant key nodes +.>

Wherein the method comprises the steps of

Is the location of the relevant key node, for example for the left hand, which is the left elbow.

Similarly, calculate the next video { Q } _M ,Q _M-1 …Q ₁ Each frame of image Q _j Is not the location of the anchor critical node of the human body

And the direction between the relevant key nodes +.>

Calculating the front and rear images P _i And Q _j Is a concatenation cost (P) _i ,Q _j )。

Where α and β are balance parameters, α determines the relative importance of the image in the video segment position relative to the distance corresponding to the human non-anchor key node, and β balances the relative importance of the directional difference between the relevant key nodes.

Taking the image pair P with minimum splicing cost _T And Q _S Respectively as { P ] ₁ ,P ₂ …P _N Sum { Q } _M ,Q _M-1 …Q ₁ The best splice point image. Delete { P ] ₁ ,P ₂ …P _N All images after number T in { Q }, Q _M ,Q _M-1 …Q ₁ All images preceding number S in }.

Step three, illumination consistency and position consistency

Respectively calculating the previous video { P } ₁ ,P ₂ …P _T Sum of the lastSegment video { Q _S ,Q _S-1 …Q ₁ Average brightness value L of human body in } _P And L _Q Will { Q _S ,Q _S-1 …Q ₁ Brightness of human body of each frame image in } is heightened by L _P -L _Q 。

Separately computing splice point images P _T And Q _S Average position value of human body anchor node

And->

Respectively by

And->

For moving scale, adjust { Q _S ,Q _S-1 …Q ₁ Location of }. />

Step four, calculating the number of interpolation transition images

Computing a splice point image P _T And Q _S The longest motion track of the human body non-anchor key node is in the embodiment of the inventionWherein the motion trail is parabolic.

The general trajectory equation for a parabola is:

where a, b, c, θ are parameters of a parabola and t is an intermediate variable of the equation. To be used for

Solving a, b, c and theta to obtain a motion track for the known points.

Calculation of P _T And Q _S Track length of human body non-anchor key node

The longest trace is marked as D _ST Respectively calculate D _ST Speed of corresponding node

And->

In the embodiment of the invention, if the non-anchor key nodes of the human body move at a constant speed along the parabolic track, the number of the interpolation transition images is as follows

Step five, generating an interpolation transition image sequence

The generated interpolation transition image sequence consists of L frame images and is marked as { R } ₁ ,R ₂ …R _L -computing the splice point image P in a step four manner _T And Q _S Uniformly dividing each motion track into L+1 parts, wherein the dividing points are interpolation transition images R of each frame _k The position of the key node corresponding to the human body, wherein k is more than or equal to 1 and less than or equal to L. As shown in fig. 1.

At R _k In the method, 8 boundary nodes (upper left, upper right, lower left, lower right, middle left, middle right, middle upper and lower middle) of the human body key nodes and the images are used as a scattered point set to perform triangulation to form a triangular grid, and the triangular grid is shown in figure 2. The vertex of each triangle is used as the target vertex, and P is respectively used as the vertex _T And Q _S Is used as a source vertex, and affine transformation of each pair of triangles is calculated. According to the transformation matrix, the triangle image composed of the source vertexes is mapped to the triangle region composed of the destination vertexes, respectively, as shown in fig. 3. Calculating transformation matrixes of all triangles in the triangle mesh to form two mapped image areas, wherein the two mapped image areas correspond to P respectively _T And Q _S 。

R is R _k Is divided into three categories: first is P _T And Q _S The corresponding areas in the two are foreground or background, and P _T Corresponding region of (1) is foreground and Q _S The corresponding region of (2) is background, thirdly P _T Corresponding region of (1) is background and Q _S Is foreground. According to R _k Position k in the interpolated transition image sequence, P in the first type region _T And Q _S Is scaled to the pixels of (2)

And->

Mixing, in the second mapping region, P _T Is proportional to +.>

Reduction, Q _S Correspondingly increasing background pixels of (2), in the third class of mapping region, Q _S Is proportional to +.>

Increase, P _T Correspondingly reduced background pixels. The order of increasing and decreasing is measured by the shortest distance of the foreground pixel and the foreground of another splice point image, as shown in fig. 4. Preferably, to calculate the shortest distance with minimal temporal and spatial complexity, manhattan distance is used as a metric and the search is spread around from the current position of the foreground pixel. Where the order refers to, taking the second type of region as an example, assume that the region has 100 pixels in total, 30 of which are foreground, each corresponding to one of the shortest distances. If k=1, l=9, it is indicated that the 1 st frame interpolation image only retains P _T 90% of the foreground pixels, i.e. proportionally reduced +.>

The 10% decrease is those pixels with the greatest shortest distance, and the 10% decrease is those areas, then Q _S I.e. corresponding increase.

It should be understood that, according to an embodiment of the present invention, any method in the prior art may be used to obtain key nodes (including anchor nodes) of a human body, for example, the key nodes may be obtained directly by using a Kinect camera, or may be obtained by calculating by using a human body gesture recognition (e.g. deep learning) method.

The following is a system example corresponding to the above method example, and this embodiment mode may be implemented in cooperation with the above embodiment mode. The related technical details mentioned in the above embodiments are still valid in this embodiment, and in order to reduce repetition, they are not repeated here. Accordingly, the related technical details mentioned in the present embodiment can also be applied to the above-described embodiments.

And->

Mixing, in the second mapping region, P _T Is proportional to +.>

Reduction, Q _S Is proportional to +.>

Increasing, in the third mapping region, Q _S Is proportional to +.>

Increase, P _T Is proportional to +.>

Reduction;

where alpha and beta are the balance parameters,

Image P in the first video _i Positions of non-anchor critical nodes, +.>

Image P in the first video _i I and j are the positions of the image in the first video and the last video, respectively. />

Claims

1. A method of video stitching based on interpolation transitions, comprising:

2. The method for video stitching based on interpolation transition according to claim 1, wherein classifying the region of the interpolation transition image in the step 6 specifically includes:

3. The interpolation transition based video stitching method as recited in claim 2, wherein the step 6 includes:

And->

Mixing, in the second mapping region, P _T Foreground pixels of (a)Proportional->

Reduction, Q _S Is proportional to +.>

Increasing, in the third mapping region, Q _S Is proportional to +.>

Increase, P _T Is proportional to +.>

Reduction;

4. The method for video stitching based on interpolation transition according to claim 1, 2 or 3, wherein the stitching cost of the front and rear images of the first video and the tail video obtained in the step 2 specifically includes:

where alpha and beta are the balance parameters,

for image Q in tail video _j Positions of non-anchor critical nodes, +.>

Image P in the first video _i Positions of non-anchor critical nodes, +.>

5. The interpolation transition-based video stitching method according to claim 1, wherein the motion trajectory in step 4 is a straight line or a curved line.

6. A video stitching system based on interpolation transitions, comprising:

7. The interpolation transition-based video stitching system of claim 6 wherein the module 6 classifies regions of the interpolation transition image in particular comprising:

8. The interpolation transition based video stitching system of claim 7, wherein the module 6 includes:

interpolation is carried out according to the interpolation transition imagePosition k in the transition image sequence, in the first type of region, the previous best splice point image P _T And the last best splice point image Q _S Is scaled to the pixels of (2)

And->

Mixing, in the second mapping region, P _T Is proportional to +.>

Reduction, Q _S Is proportional to +.>

Increasing, in the third mapping region, Q _S Is proportional to +.>

Increase, P _T Is proportional to +.>

Reduction;

9. The interpolation transition-based video stitching system according to claim 6, 7 or 8, wherein the cost of obtaining the front and rear images of the front video and the rear video in the module 2 specifically includes:

where alpha and beta are the balance parameters,

for image Q in tail video _j Positions of non-anchor critical nodes, +.>

Image P in the first video _i Positions of non-anchor critical nodes, +.>

10. The interpolation transition-based video stitching system according to claim 6, wherein the motion profile in block 4 is a straight line or a curved line.