CN117596483B

CN117596483B - Video anti-shake splicing method, device, equipment and storage medium

Info

Publication number: CN117596483B
Application number: CN202410074451.3A
Authority: CN
Inventors: 王虹林; 蒋燕; 韩畅; 叶威; 丁金善
Original assignee: Shenzhen Hankvision Technology Co ltd
Current assignee: Shenzhen Hankvision Technology Co ltd
Priority date: 2024-01-18
Filing date: 2024-01-18
Publication date: 2024-04-02
Anticipated expiration: 2044-01-18
Also published as: CN117596483A

Abstract

The application relates to the technical field of deep learning, and discloses a video anti-shake splicing method, device and equipment and a storage medium. The method comprises the following steps: shooting a video through a handheld device to obtain first video data, and acquiring pressure distribution data of a handheld part and multi-axis acceleration signal data; calculating a plurality of pixel displacement data; carrying out distribution intensity and feature extraction to obtain a pressure distribution feature set and carrying out motion feature extraction to obtain an acceleration motion feature set; feature fusion is carried out to obtain a target fusion feature set; performing time sequence association analysis and matrix conversion to obtain a displacement characteristic relation matrix; the video anti-shake compensation parameter analysis is carried out through the video anti-shake compensation model, the target video anti-shake compensation parameter is obtained, the anti-shake processing and the video splicing are carried out, and the second video data is obtained.

Description

Video anti-shake splicing method, device, equipment and storage medium

Technical Field

The application relates to the technical field of deep learning, in particular to a video anti-shake splicing method, device and equipment and a storage medium.

Background

Video shot by the handheld device often faces shaking and jitter problems, so that watching experience is poor, and video quality is reduced. Furthermore, it is also a challenge to achieve stable video stitching under different scenes and shooting conditions. Therefore, research into video anti-shake and stitching methods becomes critical to improve video quality and enhance user experience.

However, current video anti-shake and stitching techniques still have some problems. The existing anti-shake method often depends on software post-processing, requires additional computing resources and time, and is not real-time and efficient enough. Secondly, for video stitching, handheld shooting under different scenes and conditions causes incoherence between videos, and natural switching and transition are difficult to achieve.

Disclosure of Invention

The application provides a video anti-shake splicing method, device, equipment and storage medium, which are used for realizing intelligent video anti-shake splicing and improving the anti-shake display effect of videos.

In a first aspect, the present application provides a video anti-shake stitching method, where the video anti-shake stitching method includes:

video shooting is carried out through preset handheld equipment to obtain first video data, and handheld position pressure distribution acquisition and multi-axis acceleration signal acquisition are carried out on the handheld equipment to obtain handheld position pressure distribution data and multi-axis acceleration signal data;

Dividing the first video data into a plurality of initial video frames, and calculating pixel displacement between two adjacent frames in the plurality of initial video frames to obtain a plurality of pixel displacement data;

extracting distribution intensity and characteristics of the pressure distribution data of the handheld part to obtain a pressure distribution characteristic set, and extracting motion characteristics of the multi-axis acceleration signal data to obtain an acceleration motion characteristic set;

calculating correlation coefficients of the pressure distribution feature set and the acceleration motion feature set to obtain a target correlation coefficient, and carrying out feature fusion on the pressure distribution feature set and the acceleration motion feature set according to the target correlation coefficient to obtain a target fusion feature set;

performing time sequence association analysis and matrix conversion on the pixel displacement data and the target fusion feature set to obtain a displacement feature relation matrix;

inputting the displacement characteristic relation matrix into a preset video anti-shake compensation model to perform video anti-shake compensation parameter analysis to obtain target video anti-shake compensation parameters, and performing anti-shake processing and video splicing on the plurality of initial video frames according to the target video anti-shake compensation parameters to obtain second video data.

In a second aspect, the present application provides a video anti-shake stitching device, the video anti-shake stitching device includes:

the acquisition module is used for carrying out video shooting through preset handheld equipment to obtain first video data, and carrying out handheld position pressure distribution acquisition and multi-axis acceleration signal acquisition on the handheld equipment to obtain handheld position pressure distribution data and multi-axis acceleration signal data;

the computing module is used for carrying out video frame segmentation on the first video data to obtain a plurality of initial video frames, and computing pixel displacement between two adjacent frames in the plurality of initial video frames to obtain a plurality of pixel displacement data;

the characteristic extraction module is used for carrying out distribution intensity and characteristic extraction on the pressure distribution data of the handheld part to obtain a pressure distribution characteristic set, and carrying out motion characteristic extraction on the multi-axis acceleration signal data to obtain an acceleration motion characteristic set;

the feature fusion module is used for calculating the correlation coefficient of the pressure distribution feature set and the acceleration motion feature set to obtain a target correlation coefficient, and carrying out feature fusion on the pressure distribution feature set and the acceleration motion feature set according to the target correlation coefficient to obtain a target fusion feature set;

The conversion module is used for carrying out time sequence association analysis and matrix conversion on the pixel displacement data and the target fusion feature set to obtain a displacement feature relation matrix;

the processing module is used for inputting the displacement characteristic relation matrix into a preset video anti-shake compensation model to perform video anti-shake compensation parameter analysis to obtain target video anti-shake compensation parameters, and performing anti-shake processing and video splicing on the plurality of initial video frames according to the target video anti-shake compensation parameters to obtain second video data.

A third aspect of the present application provides a video anti-shake stitching device, comprising: a memory and at least one processor, the memory having instructions stored therein; and the at least one processor invokes the instructions in the memory to enable the video anti-shake splicing device to execute the video anti-shake splicing method.

A fourth aspect of the present application provides a computer readable storage medium having instructions stored therein that, when executed on a computer, cause the computer to perform the video anti-shake stitching method described above.

In the technical scheme provided by the application, through collecting multiaxial acceleration signal data and pressure distribution data of handheld equipment, camera motion can be analyzed and compensated more accurately, so that efficient video anti-shake is realized. This helps to reduce shaking and jittering in the video, improving the stability and viewing of the video. The video stitching process can be adaptively adjusted according to different situations by using correlation coefficient calculation and feature fusion. Video stitching may be optimized according to shooting scene and conditions to obtain a more natural and consistent video stream. The relationship between video frames can be better understood by analyzing the temporal correlation between pixel displacement data and target features. This helps to ensure continuity and smoothness of video stitching, reducing non-coherent transitions. By comprehensively utilizing various data sources, including acceleration data, pressure distribution data, and pixel displacement data, the quality of video can be improved. This includes reducing blurring, dithering, color bias, etc., and provides a clearer, more stable and natural video output. Deep feature extraction and analysis are performed by adopting deep neural network technologies such as a convolution long-short time memory network and the like, so that the relevance between visual and perceived data can be more accurately understood. Not only can shaking be reduced, but also more complex scenes and dynamic conditions can be processed. By using advanced deep learning techniques, video content can be automatically analyzed, edited, and optimized. The user does not need to edit and adjust the video manually any more, but can obtain the video post-processing with high automation and intellectualization, thereby realizing the intelligent video anti-shake splicing and improving the anti-shake display effect of the video.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained based on these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of an embodiment of a video anti-shake stitching method according to an embodiment of the present application;

fig. 2 is a schematic diagram of an embodiment of an anti-shake video stitching device according to an embodiment of the present application.

Detailed Description

The embodiment of the application provides a video anti-shake splicing method, device and equipment and a storage medium. The terms "first," "second," "third," "fourth" and the like in the description and in the claims of this application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.

For easy understanding, the following describes a specific flow of an embodiment of the present application, referring to fig. 1, and one embodiment of a video anti-shake stitching method in the embodiment of the present application includes:

step S101, video shooting is carried out through preset handheld equipment to obtain first video data, and handheld position pressure distribution acquisition and multi-axis acceleration signal acquisition are carried out on the handheld equipment to obtain handheld position pressure distribution data and multi-axis acceleration signal data;

it can be understood that the execution body of the application may be a video anti-shake splicing device, and may also be a terminal or a server, which is not limited herein. The embodiment of the present application will be described by taking a server as an execution body.

Specifically, video shooting is performed through a preset handheld device, and original video stream data are obtained. In order to improve video quality and to ensure color accuracy and consistency, color correction is performed on these raw video stream data, resulting in more accurate and stable first video data. Color correction may be implemented by various algorithms, such as white balance adjustment and color space conversion, to accommodate different lighting conditions and device characteristics. And collecting the pressure distribution of the handheld part. By integrating a pressure sensor at the hand-held part of the device, the pressure change during use is continuously recorded. The collected multiple initial pressure distribution data are processed through average pressure calculation, so that more stable and reliable handheld position pressure distribution data are obtained. This data not only reflects the way the user holds the device, but can also be used to infer stability during video capture and the user's movement pattern. And acquiring multi-axis acceleration signals of the equipment. This includes collecting first acceleration data for the x-axis, second acceleration data for the y-axis, and third acceleration data for the z-axis. These data provide important information about the direction and speed of movement of the device in three dimensions. And synthesizing acceleration data in three directions through a preset multi-axis acceleration calculation function to form initial multi-axis acceleration data. The multi-axis acceleration computing function is an algorithm based on a physical model, and can accurately compute the comprehensive acceleration vector of the equipment in space. And carrying out Fourier signal transformation on the initial multi-axis acceleration data, converting the time domain signal into a frequency domain signal, and revealing the frequency component of the signal. The obtained multi-axis acceleration signal data not only provides a detailed description of the movement of the device in space, but also includes frequency information.

Step S102, video frame segmentation is carried out on the first video data to obtain a plurality of initial video frames, and pixel displacement between two adjacent frames in the plurality of initial video frames is calculated to obtain a plurality of pixel displacement data;

specifically, image change rate calculation is performed on the first video data, and the change degree of each frame of image and the previous frame of image in the video is evaluated to obtain image change rate data. These data reflect the speed and magnitude of the image changes in the video. The first video data is subjected to a smooth transition process to reduce abrupt or drastic changes in video capture, which helps to stabilize the video content, making it more suitable for analysis and processing, and thus to obtain standard video data. The standard video data is subjected to video frame segmentation, and the video is decomposed into a plurality of initial video frames, and the frames are the basis of anti-shake and splicing processing. After the segmentation, the picture center recognition is carried out on each initial video frame, and the picture center pixel point of each frame is determined. Based on the picture center pixel point, an initial pixel cloud image of each initial video frame is constructed, and the pixel cloud images describe the pixel distribution condition of each frame in detail. And identifying adjacent points of the picture center pixel points in each initial pixel cloud picture by using a K-time neighbor algorithm, and identifying K nearest adjacent points of each initial video frame. And respectively calculating the distances between the K nearest neighbors and the picture center pixel point of each initial video frame, namely, a first point distance, and carrying out average value operation on the first point distances to obtain a second point distance of each initial video frame. This second point distance is an important index for measuring the distribution stability and consistency of the pixels near the center of the picture. And based on the obtained second point distance data, performing pixel displacement calculation on two adjacent frames in the plurality of initial video frames to determine the relative movement condition between one frame and the next frame. These pixel displacement data provide the necessary information to adjust the position and orientation of each frame to reduce or eliminate video jitter and thereby achieve a smoother and consistent video effect.

Step S103, carrying out distribution intensity and feature extraction on pressure distribution data of the handheld part to obtain a pressure distribution feature set, and carrying out motion feature extraction on multi-axis acceleration signal data to obtain an acceleration motion feature set;

specifically, the pressure distribution data of the hand-held part is subjected to pressure integration through a preset pressure integration function, and the accumulated pressure distribution intensity in a period of time is calculated through an integration process by the function. This process reflects the pressure change when the user holds the device based on an integral model of the pressure change over time. The resulting cumulative pressure distribution intensity not only provides an overall profile of pressure changes, but also reveals a trend in hand-held stability during shooting. The cumulative pressure distribution intensity is curve fitted to generate a pressure distribution intensity curve that is smoother and easier to analyze. This curve reflects the trend and pattern of pressure changes. By further identifying the curve characteristics of the pressure distribution intensity curve, a plurality of first pressure distribution characteristics can be obtained, wherein the characteristics comprise peak values, fluctuation frequencies, change trends and the like. To ensure that these features have universal comparative and analytical value, they are feature normalized, converting features of different magnitudes or units into a unified standard form, and then performing set conversion, integrating into a comprehensive pressure distribution feature set. Likewise, the multiaxial acceleration signal data are analyzed. And obtaining an acceleration change curve reflecting the movement change of the equipment through curve fitting. These curves depict the variation of the acceleration of the device in various directions. And performing curve characteristic identification on the acceleration change curve, and extracting a plurality of first acceleration motion characteristics from the acceleration change curve. These characteristics include extremum, rate of change, periodicity of acceleration, etc. The first acceleration motion features are then feature normalized to ensure that the different features can be compared and analyzed under the same criteria. And performing set conversion on the second acceleration motion characteristics, and integrating the second acceleration motion characteristics into a comprehensive acceleration motion characteristic set. The feature set not only reflects the motion characteristics of the device in three-dimensional space, but also can be combined with the pressure distribution feature set to provide an omnibearing visual angle for understanding and analyzing the stability problem in the handheld shooting process.

Step S104, calculating correlation coefficients of the pressure distribution feature set and the acceleration motion feature set to obtain a target correlation coefficient, and carrying out feature fusion on the pressure distribution feature set and the acceleration motion feature set according to the target correlation coefficient to obtain a target fusion feature set;

specifically, a mean value of the pressure distribution feature set is calculated, which reflects the average level of the pressure distribution when the user holds the device. Meanwhile, standard deviation calculation is carried out to obtain standard deviation of pressure distribution characteristics, and the standard deviation reveals fluctuation degree of the pressure distribution characteristics around the mean value and reflects stability and consistency of the pressure distribution. Also, by calculating the mean and standard deviation of the acceleration motion characteristics, a basic statistic reflecting the motion state of the device can be obtained. The mean value of the acceleration motion features provides the average level of the motion intensity of the device, and the standard deviation of the acceleration motion features reflects the fluctuation degree of the motion features. And calculating a correlation coefficient according to the obtained pressure distribution characteristic mean value, the pressure distribution characteristic standard deviation, the acceleration motion characteristic mean value and the acceleration motion characteristic standard deviation. The correlation coefficient is a statistic for measuring the linear correlation degree between two groups of data, and can reveal the relation strength and direction between the pressure distribution characteristic and the acceleration motion characteristic. Through the calculation, a target correlation coefficient can be obtained, and the target correlation coefficient directly influences the subsequent feature fusion process. And performing feature conversion on the pressure distribution feature set and the acceleration motion feature set according to the obtained target correlation coefficient, so as to obtain a plurality of target pressure distribution features and a plurality of target acceleration motion features. The feature set is refined and optimized by operations such as adjusting the specific gravity of the features, combining similar features or excluding noise features, so that the feature set is more suitable for subsequent fusion and analysis. And carrying out feature set fusion on the multiple target pressure distribution features and the multiple target acceleration motion features. By comprehensively considering the correlation between pressure distribution and acceleration movement and integrating the information of the pressure distribution and the acceleration movement, a target fusion feature set which comprehensively reflects the handheld stability and the movement state of the equipment is formed.

Step S105, performing time sequence association analysis and matrix conversion on the plurality of pixel displacement data and the target fusion feature set to obtain a displacement feature relation matrix;

specifically, first time stamp data of a plurality of pixel displacement data are acquired, and second time stamp data of a target fusion feature set are acquired. The time stamp data ensures that the subsequent processing steps can synchronize and correlate different types of data at the correct point in time. And carrying out time sequence association analysis and corresponding matching on the plurality of pixel displacement data and the target fusion feature set according to the time stamps to form a clear corresponding relation, and converting the time sequence data into useful information. Next, a matrix conversion is performed on the plurality of pixel displacement data and the target fusion feature set. And organizing the time sequence associated data into a displacement characteristic relation matrix. The displacement characteristic relation matrix not only can integrate multidimensional data, but also can provide a convenient mathematical expression form for subsequent calculation and analysis. In this matrix, each three-dimensional column vector sequentially contains corresponding pixel displacement data, target pressure distribution characteristics and target acceleration motion characteristics.

Step S106, inputting the displacement characteristic relation matrix into a preset video anti-shake compensation model to perform video anti-shake compensation parameter analysis to obtain target video anti-shake compensation parameters, and performing anti-shake processing and video splicing on a plurality of initial video frames according to the target video anti-shake compensation parameters to obtain second video data.

Specifically, the displacement characteristic relation matrix is input into a preset video anti-shake compensation model. The model is a deep learning framework consisting of a plurality of layers, including a convolution long-short time memory network, a single-layer long-short time memory network and a full connection layer. And performing deep feature operation on the displacement feature relation matrix through a convolution long-short-term memory network, extracting complex and deep space-time features in video data, and obtaining a finer deep feature relation matrix. The deep characteristic relation matrix is further analyzed through a single-layer long-short-term memory network, and time sequence data can be effectively processed and analyzed through the unique memory capacity of the deep characteristic relation matrix, so that time sequence characteristics and modes hidden in the deep characteristic relation matrix are revealed. Through the analysis, a target characteristic relation matrix is obtained, and the matrix captures the dynamic association and the change trend between video frames more accurately. And processing the target characteristic relation matrix through a ReLU (Rectified Linear Unit) activation function in the full connection layer, and calculating the video anti-shake compensation parameter. ReLU functions are widely used in deep learning to increase model complexity and nonlinearity capability due to their nonlinear and sparse activation characteristics. Through the calculation of the layer, a set of accurate and practical target video anti-shake compensation parameters can be obtained, and the parameters are key for the actual anti-shake processing. And performing anti-shake processing on the plurality of initial video frames. And adjusting the position and the posture of each frame through a corresponding algorithm according to the anti-shake compensation parameters of the target video so as to reduce or eliminate the influence caused by shake and obtain a plurality of target video frames. These anti-shake processed video frames will have higher stability and consistency. Video stitching is performed on a plurality of target video frames, and the stable video frames are combined into continuous and smooth second video data through specific stitching technology and algorithm. This video data is not only more visually stable and fluent, but also preserves the integrity and consistency of the original video content.

In the embodiment of the application, the camera motion can be more accurately analyzed and compensated by collecting the multi-axis acceleration signal data and the pressure distribution data of the handheld device, so that efficient video anti-shake is realized. This helps to reduce shaking and jittering in the video, improving the stability and viewing of the video. The video stitching process can be adaptively adjusted according to different situations by using correlation coefficient calculation and feature fusion. Video stitching may be optimized according to shooting scene and conditions to obtain a more natural and consistent video stream. The relationship between video frames can be better understood by analyzing the temporal correlation between pixel displacement data and target features. This helps to ensure continuity and smoothness of video stitching, reducing non-coherent transitions. By comprehensively utilizing various data sources, including acceleration data, pressure distribution data, and pixel displacement data, the quality of video can be improved. This includes reducing blurring, dithering, color bias, etc., and provides a clearer, more stable and natural video output. Deep feature extraction and analysis are performed by adopting deep neural network technologies such as a convolution long-short time memory network and the like, so that the relevance between visual and perceived data can be more accurately understood. Not only can shaking be reduced, but also more complex scenes and dynamic conditions can be processed. By using advanced deep learning techniques, video content can be automatically analyzed, edited, and optimized. The user does not need to edit and adjust the video manually any more, but can obtain the video post-processing with high automation and intellectualization, thereby realizing the intelligent video anti-shake splicing and improving the anti-shake display effect of the video.

In a specific embodiment, the process of executing step S101 may specifically include the following steps:

(1) Shooting a video through a preset handheld device to obtain original video stream data, and performing color correction on the original video stream data to obtain first video data;

(2) Acquiring pressure distribution of a handheld part of the handheld device to obtain a plurality of initial pressure distribution data, and calculating average pressure of the plurality of initial pressure distribution data to obtain pressure distribution data of the handheld part;

(3) Acquiring multi-axis acceleration signals of the handheld device to obtain first acceleration data of an x axis, second acceleration data of a y axis and third acceleration data of a z axis;

(4) Performing multi-axis acceleration calculation on the first acceleration data of the x axis, the second acceleration data of the y axis and the third acceleration data of the z axis through a preset multi-axis acceleration calculation function to obtain initial multi-axis acceleration data, wherein the multi-axis acceleration calculation function is as follows:a represents initial multi-axis acceleration data, a _x First acceleration data representing the x-axis, a _y Second acceleration data representing y-axis, a _z Third acceleration data representing the z-axis,

(5) And carrying out Fourier signal transformation on the initial multi-axis acceleration data to obtain multi-axis acceleration signal data.

Specifically, video shooting is performed through a preset handheld device to obtain original video stream data, wherein the data are unprocessed and comprise color deviations caused by factors such as light conditions, device limitations and the like. Color correction is performed on the original video stream data, and video quality is optimized by adjusting parameters such as color balance, contrast, brightness and the like. And carrying out pressure distribution acquisition on the handheld part of the handheld device. Pressure sensors are integrated in the handheld part of the device, and the pressure sensors can monitor and record the pressure change of a user when the user holds the device in real time. By collecting a plurality of initial pressure distribution data, the user can understand the holding stability and analyze the influence of the holding mode on the video stability. And carrying out average pressure calculation on the initial pressure distribution data to obtain more accurate and comprehensive handheld position pressure distribution data. Multi-axis acceleration signal acquisition is performed on the handheld device, which includes measuring the acceleration of the device in the x, y and z axes. And measuring the first acceleration data of the x axis, the second acceleration data of the y axis and the third acceleration data of the z axis to comprehensively capture the motion condition of the device in the three-dimensional space. Then, processing is performed by a preset multi-axis acceleration calculation function. The function integrates acceleration information of x, y and z three axes by calculating the magnitude of the acceleration vector, so as to obtain initial multi-axis acceleration data representing the overall motion intensity. And carrying out Fourier signal transformation on the initial multi-axis acceleration data, converting the time domain signal into a frequency domain signal, and revealing the frequency component of the signal. The vibration frequency and intensity in the video are identified by this transformation. For example, if the user shakes his/her hand at the time of photographing, such vibration may generate a specific pattern in the frequency domain of the acceleration signal. By analyzing these patterns, the system can accurately adjust the video to compensate for these jitter, thereby achieving a smoother video effect.

In a specific embodiment, the process of executing step S102 may specifically include the following steps:

(1) Calculating the image change rate of the first video data to obtain image change rate data;

(2) Performing smooth transition processing on the first video data according to the image change rate data to obtain standard video data;

(3) Carrying out video frame segmentation on the standard video data to obtain a plurality of initial video frames;

(4) Performing picture center recognition on a plurality of initial video frames to obtain picture center pixel points of each initial video frame, and constructing an initial pixel cloud picture of each initial video frame according to the picture center pixel points;

(5) Carrying out adjacent point identification on a picture center pixel point in each initial pixel cloud picture based on a K-time neighbor algorithm to obtain K nearest adjacent points of each initial video frame;

(6) Respectively calculating the distances between K nearest neighbors and the picture center pixel point of each initial video frame to obtain a plurality of first point distances, and carrying out mean value operation on the plurality of first point distances to obtain a second point distance of each initial video frame;

(7) And carrying out pixel displacement calculation on two adjacent frames in the plurality of initial video frames according to the second point distance to obtain a plurality of pixel displacement data.

Specifically, the image change rate calculation is performed on the first video data by analyzing the differences between video frames, using, for example, pixel differences, optical flow methods, or other image processing techniques. The image change rate data reflects the motion and change speed in the video and is an important basis for subsequent smooth transition processing and frame segmentation. For example, if a scene in one video clip suddenly switches from stationary to fast moving, the rate of change of the image may appear to be a significant peak at this point. And carrying out smooth transition processing on the first video data according to the image change rate data, so as to reduce abrupt change and noise in the video and enable the video content to be smoother and more consistent. The smoothing process may be implemented by various filters such as gaussian blur, mean filtering, or median filtering in order to reduce spikes and fluctuations in the rate of change of the image, thereby obtaining standard video data. For example, if the original video has blurring caused by fast movement in a certain frame, the blurring can be reduced to some extent by the smooth transition processing, so that the scene becomes clearer. Video frame segmentation is performed on standard video data to break up a continuous video stream into a series of independent initial video frames. Each video frame is a static image representing the visual content of the video at a particular point in time. Video frame segmentation typically involves decoding a video data stream and extracting individual frames at a particular frame rate. The initial video frame is subjected to picture center recognition by image processing techniques such as edge detection, region segmentation or image moment calculation. The picture center pixel point is the visual center of each video frame and is an important reference point for subsequent analysis. And constructing an initial pixel cloud image according to the pixel point at the center of the picture. A pixel cloud is a two-dimensional set of points representing the distribution of pixels in a video frame that can be used to analyze structure and motion within the frame. In order to further analyze the pixel clouds, a K-time neighbor algorithm is adopted to identify the adjacent points of the picture center pixel point in each initial pixel cloud. The K-nearest neighbor algorithm is a distance-based algorithm that can identify the K pixels closest to the center pixel of the picture. These nearest neighbors provide important information about the distribution of pixels around the center of the picture, helping to understand the content and structure of the video frame. And respectively calculating the distances between the K nearest neighbors and the picture center pixel point of each initial video frame to obtain a plurality of first point distances. These first pitches reflect the distribution of pixels around the center of the screen. In order to obtain a more stable and reliable analysis result, a mean value operation is performed on the plurality of first point distances, so that a second point distance of each initial video frame is obtained. The second point distance is a comprehensive index, averages the distribution situation of the pixel points around the center of the picture, and can reflect the stability and the motion characteristics of the video frame more accurately. And carrying out pixel displacement calculation on two adjacent frames in the plurality of initial video frames according to the second point distance so as to obtain a plurality of pixel displacement data. Pixel displacement data is a key indicator for measuring motion changes between video frames, reflecting the speed and direction of motion between adjacent frames. From these data, motion patterns and trends in the video can be accurately identified, which facilitates subsequent video anti-shake processing and stitching.

In a specific embodiment, the process of executing step S103 may specifically include the following steps:

(1) Pressure integration is carried out on pressure distribution data of the handheld part through a preset pressure integration function, and accumulated pressure distribution intensity is obtained, wherein the pressure integration function is that：，/>Indicating the intensity of the cumulative pressure distribution>Representing pressure distribution data of the hand-held part,/->Representing time;

(2) Performing curve fitting on the accumulated pressure distribution intensity to obtain a pressure distribution intensity curve, and performing curve characteristic identification on the pressure distribution intensity curve to obtain a plurality of first pressure distribution characteristics;

(3) Performing feature standardization processing on the first pressure distribution features to obtain second pressure distribution features, and performing set conversion on the second pressure distribution features to obtain a pressure distribution feature set;

(4) Curve fitting is carried out on the multi-axis acceleration signal data to obtain an acceleration change curve, and curve characteristic identification is carried out on the acceleration change curve to obtain a plurality of first acceleration motion characteristics;

(5) And carrying out feature standardization processing on the plurality of first acceleration motion features to obtain a plurality of second acceleration motion features, and carrying out set conversion on the plurality of second acceleration motion features to obtain an acceleration motion feature set.

Specifically, integral calculation is performed on the pressure distribution data of the handheld part through a preset pressure integral function. The continuous pressure data is converted into cumulative pressure distribution intensities, providing a comprehensive view of the pressure change. In particular, this integral function is obtained by time-integrating the pressure distribution data, thereby obtaining a representation of the total pressure loadValues. For example, if the user continues to increase the grip strength while using the device, the integrated value increases over timeAnd increases to reflect the cumulative effect of the pressure. Curve fitting is performed on the cumulative pressure distribution intensity. Discrete data points are converted to a continuous pressure distribution intensity curve by a mathematical model such as polynomial fitting, exponential smoothing, or other curve fitting techniques. This curve more intuitively shows how the pressure changes over time. For example, if the user's hand is stable while taking video, the pressure distribution intensity curve will be relatively flat; conversely, if the user's hand trembles, the curve will show more fluctuations. And (3) carrying out curve characteristic identification on the pressure distribution intensity curve, and extracting key information describing pressure change. These first pressure profile features include peaks, averages, fluctuation ranges, frequencies, etc. of the curve, which collectively describe the overall characteristics of the pressure profile. In order to make the features have comparability and consistency, the features are standardized, the features of different orders or units are converted into the same standard, and then the integrated conversion is carried out to integrate the features into a comprehensive pressure distribution feature set. Meanwhile, multi-axis acceleration signal data are analyzed, and discrete acceleration data points are converted into continuous acceleration change curves. These curves provide an visual depiction of the state of motion of the device, showing how the acceleration varies over time. And (3) performing curve characteristic recognition on the acceleration change curve, and extracting key information describing the motion state. These first acceleration motion characteristics include extrema, rate of change, periodicity, etc. of the acceleration profile that collectively reflect the motion characteristics and stability of the device. And carrying out feature standardization processing and set conversion on the features to form a comprehensive acceleration motion feature set.

In a specific embodiment, the process of executing step S104 may specifically include the following steps:

(1) Carrying out mean value calculation on the pressure distribution characteristic set to obtain a pressure distribution characteristic mean value, and carrying out standard deviation calculation on the pressure distribution characteristic set to obtain a pressure distribution characteristic standard deviation;

(2) The method comprises the steps of carrying out mean value calculation on an acceleration motion feature set to obtain an acceleration motion feature mean value, and carrying out standard deviation calculation on the acceleration motion feature set to obtain an acceleration motion feature standard deviation;

(3) According to the pressure distribution characteristic mean value, the pressure distribution characteristic standard deviation, the acceleration motion characteristic mean value and the acceleration motion characteristic standard deviation, carrying out correlation coefficient calculation on the pressure distribution characteristic set and the acceleration motion characteristic set to obtain a target correlation coefficient;

(4) Performing feature conversion on the pressure distribution feature set and the acceleration motion feature set according to the target correlation coefficient to obtain a plurality of target pressure distribution features and a plurality of target acceleration motion features;

(5) And carrying out feature set fusion on the multiple target pressure distribution features and the multiple target acceleration motion features to obtain a target fusion feature set.

Specifically, statistical analysis is performed on the pressure distribution feature set, including calculation of the mean and standard deviation. The mean value calculation provides the mean level of the pressure distribution characteristics and is an important index for measuring the overall pressure trend. Standard deviation calculations provide a degree of fluctuation in the pressure distribution characteristics, reflecting the dispersion of the data points relative to the mean. A high standard deviation means that the difference between the pressure distribution characteristic points is large, whereas it is small. Similarly, a statistical analysis is performed on the acceleration motion feature set, including the calculation of the mean and standard deviation. The acceleration motion characteristic mean value reflects the average intensity of the motion state of the equipment, and the acceleration motion characteristic standard deviation shows the variation range and uncertainty of the motion intensity. For example, the mean value of the acceleration motion characteristics of a device for static shooting in a stable environment is close to zero, and the standard deviation is small; in the case of severe shifts or jitter, the mean and standard deviation increase accordingly. And (5) performing correlation coefficient calculation. The correlation coefficient is a statistical index for measuring the strength and direction of the linear relation between the two groups of data, and the target correlation coefficient reflecting the degree of correlation between the two groups of feature sets can be obtained by calculating the correlation between the pressure distribution feature mean, the standard deviation, the acceleration motion feature mean and the standard deviation. And performing feature conversion on the pressure distribution feature set and the acceleration motion feature set according to the calculated target correlation coefficient so as to better reflect the correlation between the pressure distribution feature set and the acceleration motion feature set. This transformation process involves operations such as adjusting feature weights, combining similar features, or excluding noise features, in order to refine and optimize the feature set for more suitable subsequent analysis and application. For example, if there is a strong correlation between a certain pressure distribution feature and an acceleration motion feature, the two features may be combined into one integrated feature to simplify the model and improve analysis efficiency. And carrying out feature set fusion on the target pressure distribution feature and the target acceleration motion feature to form a comprehensive target fusion feature set. The fusion process integrates information about pressure distribution and acceleration motion, and provides an omnidirectional view to understand and analyze the stability and motion state of the handheld device. Through the comprehensive feature set, the jitter phenomenon in the video can be more accurately identified and predicted, and support is provided for anti-jitter processing and splicing of the video. For example, if the target fusion feature set shows sudden changes in pressure and acceleration features in a video segment, the system may determine that the video is jittering based on this, and adjust the anti-shake algorithm accordingly to eliminate these jittering effects.

In a specific embodiment, the process of executing step S105 may specifically include the following steps:

(1) Acquiring first time stamp data of a plurality of pixel displacement data, and acquiring second time stamp data of a target fusion feature set;

(2) Performing time sequence association analysis and corresponding matching on the plurality of pixel displacement data and the target fusion feature set according to the first time stamp data and the second time stamp data to obtain a corresponding relation between each pixel displacement data and the target fusion feature set;

(3) And according to the corresponding relation, performing matrix conversion on the pixel displacement data and the target fusion feature set to obtain a displacement feature relation matrix, wherein the displacement feature relation matrix comprises a plurality of three-dimensional column vectors, and each three-dimensional column vector sequentially comprises the pixel displacement data, the target pressure distribution feature and the target acceleration motion feature.

Specifically, first time stamp data of a plurality of pixel displacement data are acquired, and second time stamp data of a target fusion feature set are acquired. The time stamp data ensures that the data can be accurately synchronized and aligned. And carrying out time sequence association analysis and corresponding matching on the plurality of pixel displacement data and the target fusion feature set according to the first time stamp data and the second time stamp data. By comparing and aligning the time stamps of different data sources, accurate matching of pixel displacement data in video frames and pressure distribution characteristics and acceleration movement characteristics recorded by the sensor can be ensured. And constructing a corresponding relation between each pixel displacement data and the target fusion feature set through time sequence association analysis. And performing matrix conversion on the plurality of pixel displacement data and the target fusion feature set according to the corresponding relation. And organizing the data corresponding to the time sequence into a displacement characteristic relation matrix form so as to facilitate further analysis and processing. The displacement characteristic relation matrix is a complex data structure and comprises a plurality of three-dimensional column vectors, and each column vector sequentially comprises corresponding pixel displacement data, target pressure distribution characteristics and target acceleration movement characteristics.

In a specific embodiment, the process of executing step S106 may specifically include the following steps:

(1) Inputting a displacement characteristic relation matrix into a preset video anti-shake compensation model, wherein the video anti-shake compensation model comprises a convolution long-short time memory network, a single-layer long-short time memory network and a full-connection layer;

(2) Performing deep feature operation on the displacement feature relation matrix through a convolution long-short-term memory network to obtain a deep feature relation matrix;

(3) Carrying out hidden characteristic analysis on the deep characteristic relation matrix through a single-layer long-short-term memory network to obtain a target characteristic relation matrix;

(4) Calculating video anti-shake compensation parameters of the target feature relation matrix through a ReLU function in the full-connection layer to obtain target video anti-shake compensation parameters;

(5) Performing anti-shake processing on a plurality of initial video frames according to the target video anti-shake compensation parameters to obtain a plurality of target video frames;

(6) And video stitching is carried out on the target video frames to obtain second video data.

Specifically, the displacement characteristic relation matrix is input into a preset video anti-shake compensation model. The model is a deep learning network, which consists of a convolution long and short time memory network (Convolutional LSTM), a single layer long and short time memory network (LSTM) and fully connected layers. The components work together, useful information is extracted from the displacement characteristic relation matrix through a deep learning method, and compensation parameters for video anti-shake are calculated. By introducing an LSTM structure on the basis of a Convolutional Neural Network (CNN), convLSTM is able to effectively capture spatial features and time dependencies in time series data. When the displacement feature relation matrix is input into ConvLSTM, the network can perform deep feature operation on the displacement feature relation matrix, and complex features describing movement and change between video frames are extracted. The features are organized into a deep feature relationship matrix. And inputting the deep characteristic relation matrix into a single-layer long-short-time memory network. LSTM is a special Recurrent Neural Network (RNN) capable of learning long-term dependency information suitable for processing and predicting important events in time series data. The LSTM performs further hidden feature analysis on the deep feature relation matrix, discovers deeper time associations and patterns, and encodes the information into a target feature relation matrix. And calculating video anti-shake compensation parameters of the target feature relation matrix through the ReLU function in the full connection layer. The ReLU function is a nonlinear function that can increase the nonlinear characteristics of the network without affecting the relative ordering among neurons, thereby helping the network learn complex features. At this layer, video anti-shake compensation parameter calculation is performed on the target feature relation matrix, and a group of parameters for subsequent anti-shake processing is output. And performing anti-shake processing on the plurality of initial video frames. The position and orientation of the video frames are adjusted to compensate for undesired motion due to hand jitter or other factors. By applying the calculated compensation parameters, each video frame is adjusted to a more stable state, thereby obtaining a series of anti-shake processed target video frames. Video stitching is performed on the target video frames, and continuous and smooth second video data is created. The video stitching process requires precise alignment and fusion of individual video frames to ensure consistency and visual effect during the conversion process. Through this process, the dithered but unstable video is converted into a smoother and higher quality video stream.

The video anti-shake stitching method in the embodiment of the present application is described above, and the video anti-shake stitching device in the embodiment of the present application is described below, referring to fig. 2, one embodiment of the video anti-shake stitching device in the embodiment of the present application includes:

the acquisition module 201 is configured to perform video shooting through a preset handheld device to obtain first video data, and perform handheld position pressure distribution acquisition and multi-axis acceleration signal acquisition on the handheld device to obtain handheld position pressure distribution data and multi-axis acceleration signal data;

the calculating module 202 is configured to divide the video frame of the first video data to obtain a plurality of initial video frames, and calculate pixel displacement between two adjacent frames in the plurality of initial video frames to obtain a plurality of pixel displacement data;

the feature extraction module 203 is configured to perform distribution intensity and feature extraction on the pressure distribution data of the handheld portion to obtain a pressure distribution feature set, and perform motion feature extraction on the multi-axis acceleration signal data to obtain an acceleration motion feature set;

the feature fusion module 204 is configured to perform correlation coefficient calculation on the pressure distribution feature set and the acceleration motion feature set to obtain a target correlation coefficient, and perform feature fusion on the pressure distribution feature set and the acceleration motion feature set according to the target correlation coefficient to obtain a target fusion feature set;

The conversion module 205 is configured to perform time sequence association analysis and matrix conversion on the plurality of pixel displacement data and the target fusion feature set to obtain a displacement feature relation matrix;

the processing module 206 is configured to input the displacement feature relation matrix into a preset video anti-shake compensation model to perform video anti-shake compensation parameter analysis, obtain a target video anti-shake compensation parameter, and perform anti-shake processing and video stitching on the plurality of initial video frames according to the target video anti-shake compensation parameter, so as to obtain second video data.

Through the cooperation of the components, the camera motion can be more accurately analyzed and compensated by collecting the multi-axis acceleration signal data and the pressure distribution data of the handheld device, so that efficient video anti-shake is realized. This helps to reduce shaking and jittering in the video, improving the stability and viewing of the video. The video stitching process can be adaptively adjusted according to different situations by using correlation coefficient calculation and feature fusion. Video stitching may be optimized according to shooting scene and conditions to obtain a more natural and consistent video stream. The relationship between video frames can be better understood by analyzing the temporal correlation between pixel displacement data and target features. This helps to ensure continuity and smoothness of video stitching, reducing non-coherent transitions. By comprehensively utilizing various data sources, including acceleration data, pressure distribution data, and pixel displacement data, the quality of video can be improved. This includes reducing blurring, dithering, color bias, etc., and provides a clearer, more stable and natural video output. Deep feature extraction and analysis are performed by adopting deep neural network technologies such as a convolution long-short time memory network and the like, so that the relevance between visual and perceived data can be more accurately understood. Not only can shaking be reduced, but also more complex scenes and dynamic conditions can be processed. By using advanced deep learning techniques, video content can be automatically analyzed, edited, and optimized. The user does not need to edit and adjust the video manually any more, but can obtain the video post-processing with high automation and intellectualization, thereby realizing the intelligent video anti-shake splicing and improving the anti-shake display effect of the video.

The application also provides video anti-shake splicing equipment, which comprises a memory and a processor, wherein the memory stores computer readable instructions, and when the computer readable instructions are executed by the processor, the processor executes the steps of the video anti-shake splicing method in the above embodiments.

The application further provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, or may be a volatile computer readable storage medium, where instructions are stored in the computer readable storage medium, when the instructions run on a computer, cause the computer to perform the steps of the video anti-shake stitching method.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, systems and units may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random acceS memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The above embodiments are merely for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims

1. The video anti-shake splicing method is characterized by comprising the following steps of:

2. The video anti-shake stitching method according to claim 1, wherein the video shooting is performed by a preset handheld device to obtain first video data, and the handheld device is subjected to handheld position pressure distribution acquisition and multi-axis acceleration signal acquisition to obtain handheld position pressure distribution data and multi-axis acceleration signal data, and the method comprises the steps of:

Shooting a video through a preset handheld device to obtain original video stream data, and performing color correction on the original video stream data to obtain first video data;

acquiring pressure distribution of a handheld part of the handheld device to obtain a plurality of initial pressure distribution data, and calculating average pressure of the plurality of initial pressure distribution data to obtain pressure distribution data of the handheld part;

acquiring multi-axis acceleration signals of the handheld device to obtain first acceleration data of an x axis, second acceleration data of a y axis and third acceleration data of a z axis;

performing multi-axis acceleration calculation on the first acceleration data of the x axis, the second acceleration data of the y axis and the third acceleration data of the z axis through a preset multi-axis acceleration calculation function to obtain initial multi-axis acceleration data, wherein the multi-axis acceleration calculation function is as follows:a represents initial multi-axis acceleration data, a _x First acceleration data representing the x-axis, a _y Second acceleration data representing y-axis, a _z Third acceleration data representing the z-axis,

and carrying out Fourier signal transformation on the initial multi-axis acceleration data to obtain multi-axis acceleration signal data.

3. The method for anti-shake stitching according to claim 1, wherein said performing video frame segmentation on the first video data to obtain a plurality of initial video frames, and calculating pixel displacement between two adjacent frames in the plurality of initial video frames to obtain a plurality of pixel displacement data includes:

calculating the image change rate of the first video data to obtain image change rate data;

performing smooth transition processing on the first video data according to the image change rate data to obtain standard video data;

dividing the standard video data into video frames to obtain a plurality of initial video frames;

performing picture center identification on the plurality of initial video frames to obtain picture center pixel points of each initial video frame, and constructing an initial pixel cloud picture of each initial video frame according to the picture center pixel points;

carrying out adjacent point identification on a picture center pixel point in each initial pixel cloud picture based on a K-time neighbor algorithm to obtain K nearest adjacent points of each initial video frame;

respectively calculating the distances between the K nearest neighbors and the picture center pixel point of each initial video frame to obtain a plurality of first point distances, and carrying out mean value operation on the plurality of first point distances to obtain a second point distance of each initial video frame;

And carrying out pixel displacement calculation on two adjacent frames in the plurality of initial video frames according to the second point distance to obtain a plurality of pixel displacement data.

4. The video anti-shake stitching method according to claim 1, wherein the performing distribution intensity and feature extraction on the pressure distribution data of the hand-held part to obtain a pressure distribution feature set, and performing motion feature extraction on the multi-axis acceleration signal data to obtain an acceleration motion feature set, includes:

and performing pressure integration on the pressure distribution data of the handheld part through a preset pressure integration function to obtain accumulated pressure distribution intensity, wherein the pressure integration function is as follows:，/>indicating the intensity of the cumulative pressure distribution>Representing pressure distribution data of the hand-held part,/->Representing time;

performing curve fitting on the accumulated pressure distribution intensity to obtain a pressure distribution intensity curve, and performing curve characteristic identification on the pressure distribution intensity curve to obtain a plurality of first pressure distribution characteristics;

performing feature standardization processing on the plurality of first pressure distribution features to obtain a plurality of second pressure distribution features, and performing set conversion on the plurality of second pressure distribution features to obtain a pressure distribution feature set;

Performing curve fitting on the multi-axis acceleration signal data to obtain an acceleration change curve, and performing curve characteristic identification on the acceleration change curve to obtain a plurality of first acceleration motion characteristics;

and carrying out feature standardization processing on the plurality of first acceleration motion features to obtain a plurality of second acceleration motion features, and carrying out set conversion on the plurality of second acceleration motion features to obtain an acceleration motion feature set.

5. The video anti-shake stitching method according to claim 1, wherein the calculating the correlation coefficient of the pressure distribution feature set and the acceleration motion feature set to obtain a target correlation coefficient, and performing feature fusion of the pressure distribution feature set and the acceleration motion feature set according to the target correlation coefficient to obtain a target fusion feature set, includes:

performing mean value calculation on the pressure distribution feature set to obtain a pressure distribution feature mean value, and performing standard deviation calculation on the pressure distribution feature set to obtain a pressure distribution feature standard deviation;

calculating the average value of the acceleration motion feature set to obtain an acceleration motion feature average value, and calculating the standard deviation of the acceleration motion feature set to obtain an acceleration motion feature standard deviation;

According to the pressure distribution characteristic mean value, the pressure distribution characteristic standard deviation, the acceleration motion characteristic mean value and the acceleration motion characteristic standard deviation, carrying out correlation coefficient calculation on the pressure distribution characteristic set and the acceleration motion characteristic set to obtain a target correlation coefficient;

performing feature conversion on the pressure distribution feature set and the acceleration motion feature set according to the target correlation coefficient to obtain a plurality of target pressure distribution features and a plurality of target acceleration motion features;

and carrying out feature set fusion on the target pressure distribution features and the target acceleration motion features to obtain a target fusion feature set.

6. The method of video anti-shake stitching according to claim 5, wherein the performing timing correlation analysis and matrix conversion on the plurality of pixel displacement data and the target fusion feature set to obtain a displacement feature relation matrix includes:

acquiring first time stamp data of the plurality of pixel displacement data and acquiring second time stamp data of the target fusion feature set;

performing time sequence association analysis and corresponding matching on the plurality of pixel displacement data and the target fusion feature set according to the first time stamp data and the second time stamp data to obtain a corresponding relation between each pixel displacement data and the target fusion feature set;

And according to the corresponding relation, performing matrix conversion on the pixel displacement data and the target fusion feature set to obtain a displacement feature relation matrix, wherein the displacement feature relation matrix comprises a plurality of three-dimensional column vectors, and each three-dimensional column vector sequentially comprises pixel displacement data, target pressure distribution features and target acceleration motion features.

7. The video anti-shake stitching method according to claim 6, wherein inputting the displacement characteristic relation matrix into a preset video anti-shake compensation model for performing video anti-shake compensation parameter analysis to obtain a target video anti-shake compensation parameter, performing anti-shake processing and video stitching on the plurality of initial video frames according to the target video anti-shake compensation parameter to obtain second video data, and comprising:

inputting the displacement characteristic relation matrix into a preset video anti-shake compensation model, wherein the video anti-shake compensation model comprises a convolution long-short time memory network, a single-layer long-short time memory network and a full-connection layer;

performing deep feature operation on the displacement feature relation matrix through the convolution long short-time memory network to obtain a deep feature relation matrix;

Performing hidden characteristic analysis on the deep characteristic relation matrix through the single-layer long short-time memory network to obtain a target characteristic relation matrix;

calculating the video anti-shake compensation parameters of the target feature relation matrix through a ReLU function in the full connection layer to obtain target video anti-shake compensation parameters;

performing anti-shake processing on the plurality of initial video frames according to the target video anti-shake compensation parameters to obtain a plurality of target video frames;

and video stitching is carried out on the target video frames to obtain second video data.

8. Video anti-shake splicing apparatus, its characterized in that, video anti-shake splicing apparatus includes:

9. Video anti-shake splicing apparatus, characterized in that the video anti-shake splicing apparatus comprises: a memory and at least one processor, the memory having instructions stored therein;

The at least one processor invokes the instructions in the memory to cause the video anti-shake stitching device to perform the video anti-shake stitching method of any of claims 1-7.

10. A computer readable storage medium having instructions stored thereon, which when executed by a processor, implement the video anti-shake stitching method according to any of claims 1-7.