CN116939267B

CN116939267B - Frame alignment method, device, computer equipment and storage medium

Info

Publication number: CN116939267B
Application number: CN202311187470.9A
Authority: CN
Inventors: 韩旭; 王博; 徐鸿玥; 徐胜利
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-09-14
Filing date: 2023-09-14
Publication date: 2023-12-05
Anticipated expiration: 2043-09-14
Also published as: CN116939267A

Abstract

The application provides a frame alignment method, a frame alignment device, computer equipment and a storage medium, and belongs to the technical field of image processing. The embodiment of the application provides a frame alignment method, in the frame alignment process, a similarity curve of a reference video and a similarity curve of a video to be aligned can represent the change condition of a key frame in the video relative to the same key frame, and because the key frame can represent the content of the video, the similarity change curves of two videos can accurately reflect the same fragments of the change condition in the two videos, namely the same fragments in the two videos can be found, so that the frame alignment is realized, and in the process, all frames in the video are not matched, so that the calculation amount in the frame alignment process is greatly reduced while the accuracy of the frame alignment is ensured.

Description

Frame alignment method, device, computer equipment and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a frame alignment method, apparatus, computer device, and storage medium.

Background

With the development of computer technology, video processing methods are more and more, wherein the video processing method of frame alignment can synchronize time axes of multiple videos, so that multiple videos start to be played from the same datum point, and the video processing method is often used for meeting requirements of comparison, synthesis, synchronous playing and the like of the multiple videos.

Currently, a frame alignment method calculates an inter-frame difference value between each frame in a video to be aligned and all frames in a reference video, and for each frame in the video to be aligned, selects a frame of the reference video with the smallest inter-frame difference value, and uses the frame of the reference video as an aligned frame of the video to be aligned.

The above method requires calculation of the inter-frame difference value of each frame in the video to be aligned and all frames in the reference video, and thus the required calculation amount is large.

Disclosure of Invention

The embodiment of the application provides a frame alignment method, a device, computer equipment and a storage medium, which can ensure the accuracy of frame alignment and greatly reduce the calculated amount in the frame alignment process. The technical scheme comprises the following aspects.

In one aspect, a frame alignment method is provided, the method comprising: based on a key frame set of a reference video and a key frame set of a video to be aligned, respectively acquiring a similarity curve of the reference video and a similarity curve of the video to be aligned, wherein the similarity curve of the reference video is used for indicating the change condition of the similarity between a plurality of key frames of the reference video and a target key frame of the reference video, and the similarity curve of the video to be aligned is used for indicating the change condition of the similarity between a plurality of key frames of the video to be aligned and the target key frame of the reference video;

Determining a first period from a similarity curve of videos to be aligned, wherein the first period is the same as the similarity change condition in a second period in the similarity curve of the reference video; the reference video and the video to be aligned are aligned based on the first period and the second period.

The embodiment of the application provides a frame alignment method, in the frame alignment process, a similarity curve of a reference video and a similarity curve of a video to be aligned can represent the change condition of a key frame in the video relative to the same key frame, and because the key frame can represent the content of the video, the similarity change curves of two videos can accurately reflect the same fragments of the change condition in the two videos, namely the same fragments in the two videos can be found, so that the frame alignment is realized, and in the process, all frames in the video are not matched, so that the calculation amount in the frame alignment process is greatly reduced while the accuracy of the frame alignment is ensured.

In some embodiments, the obtaining the similarity curve of the reference video and the similarity curve of the video to be aligned based on the keyframe set of the reference video and the keyframe set of the video to be aligned respectively includes:

Respectively acquiring a key frame set of a reference video and a key frame set of a video to be aligned;

calculating the similarity between each key frame in the key frame set of the reference video and the target key frame by taking any key frame in the key frame set of the reference video as the target key frame to obtain a similarity array of the reference video, and calculating the similarity between each key frame in the key frame set of the video to be aligned and the target key frame to obtain a similarity array of the video to be aligned;

and respectively acquiring a similarity curve of the reference video and a similarity curve of the video to be aligned based on the similarity array of the reference video and the similarity array of the video to be aligned.

According to the similarity curve obtained in the process, the change condition of the similarity of the key frames of the two videos relative to the key frame of the same target can be obtained, so that partial fragments with the same change condition of the key frames in the two videos are reflected, and frame alignment is realized.

In some embodiments, the obtaining the similarity curve of the reference video and the similarity curve of the video to be aligned based on the similarity array of the reference video and the similarity array of the video to be aligned respectively includes:

and respectively interpolating the similarity array of the reference video and the similarity array of the video to be aligned, and obtaining a similarity curve of the reference video and a similarity curve of the video to be aligned based on the similarity array of the reference video and the similarity array of the video to be aligned after interpolation. The similarity curve is obtained based on the interpolated similarity array, so that the similarity curve is smoother, and the influence of noise is reduced.

In some embodiments, the separately obtaining the keyframe set of the reference video and the keyframe set of the video to be aligned includes:

based on all video frames of the reference video and all video frames of the video to be aligned, respectively obtaining a frame difference array of the reference video and a frame difference array of the video to be aligned, wherein the frame differences in the frame difference array represent differences between each video frame and a previous video frame;

based on the frame difference array of the reference video and the frame difference array of the video to be aligned, respectively obtaining a frame difference curve of the reference video and a frame difference curve of the video to be aligned;

based on a frame difference curve of the reference video and a frame difference curve of the video to be aligned, respectively acquiring hash codes of video frames in the reference video and hash codes of video frames in the video to be aligned;

and respectively determining hash codes with mutation from hash codes of video frames in the reference video and hash codes of video frames in the video to be aligned, and respectively adding the video frames corresponding to the hash codes with mutation as key frames into a key frame set of the reference video and a key frame set of the video to be aligned.

Frame alignment is performed based on the key frame set obtained in the above process, instead of performing frame alignment based on all video frames in the video, or using a large number of aligned video training frame alignment models, the amount of computation in the frame alignment process can be reduced.

In some embodiments, the obtaining the frame difference curve of the reference video and the frame difference curve of the video to be aligned based on the frame difference array of the reference video and the frame difference array of the video to be aligned respectively includes:

and respectively interpolating the frame difference array of the reference video and the frame difference array of the video to be aligned, and obtaining a frame difference curve of the reference video and a frame difference curve of the video to be aligned based on the frame difference array of the reference video and the frame difference array of the video to be aligned after interpolation.

In some embodiments, determining the first period from the similarity curve of the videos to be aligned includes:

respectively calculating a peak value salient value of the similarity curve of the reference video and a peak value salient value of the similarity curve of the video to be aligned based on the maximum value and the minimum value of the similarity curve of the reference video and the maximum value and the minimum value of the similarity curve of the video to be aligned, wherein the peak value salient value is used for representing the salient degree of the peak value of the similarity curve;

determining a plurality of periods of the reference video and a plurality of periods of the video to be aligned based on the peak prominence of the similarity curve of the reference video and the peak prominence of the similarity curve of the video to be aligned;

comparing the multiple periods of the reference video with the multiple periods of the video to be aligned, and outputting the multiple periods of the reference video and the multiple periods of the video to be aligned as a second period and a first period respectively.

The process is divided into a plurality of periods based on the peak value salient value, comparison between two similarity curves is facilitated, and then the periods with the same similarity change condition in the two similarity curves can be found out by comparing the periods of the reference video with the periods of the videos to be aligned, and the periods with the same similarity change condition can reflect the same segments in the two videos, so that the two videos are aligned. In addition, frame alignment is achieved by comparing two similarity curves, rather than comparing the difference between each frame, the amount of computation consumed in finding the same segment in two videos can be reduced.

In some embodiments, determining the plurality of periods of the reference video and the plurality of periods of the video to be aligned based on the peak prominence of the similarity curve of the reference video and the peak prominence of the similarity curve of the video to be aligned comprises:

respectively screening out the peak value smaller than the peak value of the reference video in the similarity curve of the reference video and the peak value smaller than the peak value of the video to be aligned in the similarity curve of the video to be aligned to obtain the peak value larger than or equal to the peak value of the reference video in the similarity curve of the reference video and the peak value larger than or equal to the peak value of the video to be aligned in the similarity curve of the video to be aligned;

Taking the lowest trough between two adjacent peaks of the peaks which are larger than or equal to the peak salient value of the reference video in the similarity curve of the reference video as a starting frame of the reference video, and taking the similarity curve between the starting frames of the two adjacent reference videos as a period of the reference video;

and taking the lowest trough between two adjacent peaks of the peaks which are larger than or equal to the peak value salient value of the video to be aligned in the similarity curve of the video to be aligned as a starting frame of the video to be aligned, and taking the similarity curve between the starting frames of the two adjacent videos to be aligned as the period of the video to be aligned.

The process screens out partial wave peaks through the peak value salient values, and determines a wave peak similarity curve larger than or equal to the peak value salient values as a plurality of periods, so that the number of the periods is controlled within a certain range, and the calculated amount consumed by comparing the periods of the two videos is reduced.

In another aspect, a computer device is provided that includes a processor and a memory for storing at least one segment of a computer program that is loaded and executed by the processor to perform operations performed by a frame alignment method in an embodiment of the application.

In another aspect, a computer readable storage medium having stored therein at least one segment of a computer program that is loaded and executed by a processor to perform operations as performed by a frame alignment method in an embodiment of the present application is provided.

In another aspect, a computer program product or computer program is provided, the computer program product or computer program comprising computer program code stored in a computer readable storage medium, the computer program code being read from the computer readable storage medium by a processor of a computer device, the computer program code being executed by the processor such that the computer device performs the frame alignment method provided in the above-mentioned first aspect or various alternative implementations of the first aspect.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a computer device according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a computer device cluster according to an embodiment of the present application;

FIG. 3 is a method flow diagram of a frame alignment method provided in accordance with an embodiment of the present application;

FIG. 4 is a specific flowchart of a frame alignment method according to an embodiment of the present application;

FIG. 5 is a method flow diagram of another frame alignment method provided in accordance with an embodiment of the present application;

FIG. 6 is a flow chart of a method for obtaining a keyframe set according to an embodiment of the present application;

fig. 7 is a block diagram of a frame alignment apparatus according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

The terms "first," "second," and the like in this disclosure are used for distinguishing between similar elements or items having substantially the same function and function, and it should be understood that there is no logical or chronological dependency between the terms "first," "second," and "n," and that there is no limitation on the amount and order of execution.

The term "at least one" in the present application means one or more, and the meaning of "a plurality of" means two or more.

It should be noted that, the information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals related to the present application are all authorized by the user or are fully authorized by the parties, and the collection, use, and processing of the related data is required to comply with the relevant laws and regulations and standards of the relevant countries and regions. For example, the video referred to in the present application is acquired with sufficient authorization.

In order to facilitate understanding, terms related to the present application are explained below.

Key frame: among a plurality of frames constituting a section of animation, a frame having important image information;

OpenCV: a computer vision library for image processing and pattern recognition;

NumPy: an extended program library supporting a large number of dimension arrays and matrix operations and providing a large number of mathematical function libraries for the array operations;

SciPy: the scientific calculation library based on NumPy comprises common calculation such as optimization, linear algebra, integration, interpolation, fast Fourier transform and the like.

The frame alignment method provided by the embodiment of the application can be executed by computer equipment. Fig. 1 is a schematic structural diagram of a computer device according to an embodiment of the present application. Referring to fig. 1, a computer device 100 includes: a processor 101 and a memory 102.

Processor 101 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 101 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 101 may also include a main processor, which is a processor for processing data in an awake state, also called a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 101 may be integrated with a GPU (Graphics Processing Unit, image processor) for taking care of rendering and drawing of content that the display screen is required to display. In some embodiments, the processor 101 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

Memory 102 may include one or more computer-readable storage media, which may be non-transitory. Memory 102 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 102 is used to store at least one program for execution by processor 101 to implement the frame alignment method provided by the method embodiments of the present application.

In some embodiments, the computer device 100 may further optionally include: a peripheral interface 103 and at least one peripheral. The processor 101, memory 102, and peripheral interface 103 may be connected via buses or signal lines. The individual peripheral devices may be connected to the peripheral device interface 103 via buses, signal lines, or a circuit board.

In some embodiments, the computer device may be an independent physical server, or implemented as a cluster of computer devices as shown in fig. 2, that is, a server cluster or a distributed file system formed by a plurality of physical servers, or a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (content delivery network ), and basic cloud computing services such as big data and artificial intelligence platforms. Taking a computer device as a cloud server as an example, the computer device may also be referred to as a cloud platform (i.e., short for a cloud computing platform), which refers to a service based on hardware resources and software resources, and provides computing, networking, and storage capabilities. Huge data calculation processing is processed and analyzed at a remote end through a network 'cloud', and then the huge data calculation processing is returned to a user, so that the method has the characteristics of large scale, distributed, virtualized, high availability, expansibility, service on demand, safety and the like. The cloud platform can realize the rapid release and release of the configurable computing resources with lower management cost or lower interaction complexity of users and service providers.

Fig. 3 is a flowchart of a frame alignment method according to an embodiment of the present application, fig. 4 is a specific flowchart of a frame alignment method according to an embodiment of the present application, and the frame alignment method according to an embodiment of the present application is described below with reference to fig. 3 and fig. 4, and the method is performed by a computer device, as shown in fig. 3, and includes the following steps.

301. And respectively acquiring a similarity curve of the reference video and a similarity curve of the video to be aligned based on the key frame set of the reference video and the key frame set of the video to be aligned, wherein the similarity curve of the reference video is used for indicating the change condition of the similarity between a plurality of key frames of the reference video and the target key frames of the reference video, and the similarity curve of the video to be aligned is used for indicating the change condition of the similarity between a plurality of key frames of the video to be aligned and the target key frames of the reference video.

The step 301 corresponds to the process of obtaining the keyframe set from the video in the (a) diagram of fig. 4, and performing template matching on the keyframe set to obtain a similarity curve, where the similarity curve of the reference video and the similarity curve of the video to be aligned are shown in the (B) diagram of fig. 4.

Wherein, the key frames in the key frame set are representative and informative frames in the video, the reference video refers to the aligned video in the two videos needing to be subjected to frame alignment, and the video to be aligned refers to the aligned video in the two videos needing to be subjected to frame alignment. For example, in piracy video analysis, a normal video is a reference video, the piracy video to be analyzed is a video to be aligned, the same segments in the two videos can be found out through frame alignment, or in the video editing process, the two videos need to be spliced together to form a coherent video, one video is the reference video, the other video is the video to be aligned, and the same segments in the two videos can be accurately overlapped through frame alignment, so that the spliced videos achieve a coherent effect.

302. And determining a first period from the similarity curve of the videos to be aligned, wherein the first period is the same as the similarity change condition in the second period in the similarity curve of the reference video.

The step 302 corresponds to the process of obtaining a plurality of periods of the video in the (a) diagram of fig. 4, and obtaining the first period and the second period by comparing the plurality of periods of the reference video with the plurality of periods of the video to be aligned, where each period is a period determined from a similarity curve, and corresponds to a segment in the video, and as shown in the (B) diagram of fig. 4, there are two broken lines in each of the two similarity curves, and the similarity curve between the two broken lines is one period.

303. The reference video and the video to be aligned are aligned based on the first period and the second period.

The above-described steps correspond to the process of aligning two videos based on the first period and the second period in the (a) diagram of fig. 4. The above-mentioned methods for aligning the reference video and the video to be aligned based on the first period and the second period are various, and the above-mentioned alignment process will be described below taking an example of achieving alignment of two videos by a key frame. For example, key frames in the first period and the second period are aligned one by one according to the sequence, so that the aligned key frames in the first period and the second period are played at the same time, and the alignment of the reference video and the video to be aligned is realized. For another example, the first key frame in the first period and the first key frame in the second period are respectively used as starting time points of the two videos, so that the alignment of the reference video and the video to be aligned is realized. According to the frame alignment method provided by the embodiment of the application, in the frame alignment process, the similarity curve of the reference video and the similarity curve of the video to be aligned can represent the change condition of the key frame in the video relative to the same key frame, and because the key frame can represent the content of the video, the similarity change curves of the two videos can accurately reflect the same fragments of the change condition in the two videos, namely the same fragments in the two videos can be found, so that the frame alignment is realized, and in the process, all frames in the video are not matched, so that the calculation amount in the frame alignment process is greatly reduced while the accuracy of the frame alignment is ensured.

Fig. 5 is a flowchart of a frame alignment method according to an embodiment of the present application, and taking a computer device as a server, as shown in fig. 5, the method includes the following steps.

501. And the server respectively acquires a key frame set of the reference video and a key frame set of the video to be aligned.

The step 501 includes steps 5011 to 5015, referring to fig. 6, fig. 6 is a flowchart of a method for obtaining a keyframe set according to an embodiment of the present application, as shown in fig. 6, where the method includes the following steps.

5011. The server respectively acquires a frame difference array of the reference video and a frame difference array of the video to be aligned based on all video frames of the reference video and all video frames of the video to be aligned, wherein the frame differences in the frame difference array represent differences between each video frame and the previous video frame.

In the embodiment of the present application, a process of acquiring a frame difference array of a reference video based on all video frames of the reference video is taken as an example, and the process of acquiring the frame difference array is described. For each video frame in the reference video, subtracting the video frame from the previous video frame to obtain a frame difference between the video frame and the previous video frame, wherein the frame difference can represent a difference between two adjacent frames. And adding the obtained frame differences into a frame difference array of the reference video according to the sequence of the video frames.

Optionally, before the frame difference is obtained, all video frames of the reference video are obtained by processing the reference video. For example, all video frames of a reference video are acquired using the video capture function of OpenCV.

The process of obtaining the frame difference array of the video to be aligned based on all the video frames of the video to be aligned is the same as the process of obtaining the frame difference array of the reference video, and will not be described herein.

5012. The server respectively acquires a frame difference curve of the reference video and a frame difference curve of the video to be aligned based on the frame difference array of the reference video and the frame difference array of the video to be aligned.

The frame differences in the frame difference array represent differences between each video frame and the previous video frame, and a frame difference curve is used for representing the change condition of the frame differences. The abscissa of the frame difference curve may be a series of integer arrays ordered according to the time of the video frame, and the ordinate may be a value corresponding to the frame difference.

In the embodiment of the present application, taking the frame difference curve of the reference video obtained based on the frame difference array of the reference video as an example, the process of obtaining the frame difference curve will be described. Each frame difference in the frame difference array of the reference video is represented as a coordinate point on a two-dimensional coordinate, the abscissa of the coordinate point is a series of integer arrays ordered according to the time of the video frame, the ordinate is a value corresponding to the frame difference, the frame difference array of the reference video is interpolated, an interpolation function in the frame difference array of the reference video is calculated based on the interpolated frame difference array, and the frame difference array of the reference video is connected into a smooth and continuous curve through the interpolation function to obtain a frame difference curve of the reference video.

In some embodiments, the process of obtaining the frame difference curve of the reference video may use a cubic spline interpolation method. The cubic spline interpolation method is implemented by the following codes s=np.r_ [2×x [0] -x [ window L:1: -1], x, 2×x [ -1] -x [ -1: window L: -1 ].

Wherein s is used to represent a function of the frame difference curve; np is used to represent the number of interpolation points between every two points, and the larger np is, the smoother is the frame difference curve; r_ [ ] is a function for obtaining an interpolation function between two points; x is used to represent the frame difference array; window L is the length of the interpolation window used to represent how many points are consecutively fetched to calculate the function between the first and last points fetched. 2 x [0] -x [ Window L:1: -1], 2 x [ -1] -x [ -1:Window L: -1] are used to represent the process of inserting points between groups of frame difference numbers: 2 xxx0-xWindow L1: -1 is used to represent a translational frame difference array; 2 x-1-x-1: window L: -1 is used to represent the frame differences in the two frame difference arrays being swapped, through a series of translations and swapping, to obtain a plurality of points as points interposed between the frame difference arrays.

The process of obtaining the frame difference curve of the video to be aligned based on the frame difference array of the video to be aligned is the same as the process of obtaining the frame difference curve of the reference video, and will not be described herein.

5013. The server respectively acquires hash codes of video frames in the reference video and hash codes of video frames in the video to be aligned based on the frame difference curve of the reference video and the frame difference curve of the video to be aligned.

In the embodiment of the present application, a process of acquiring a hash code of a video frame in a reference video is taken as an example based on a frame difference curve of the reference video, and the process of acquiring the hash code of the video frame is described. And mapping the frame difference curve of the reference video into a plurality of hash codes by using a hash function to obtain the plurality of hash codes of the reference video. Each hash code corresponds to a video frame of a reference video among a plurality of hash codes of the reference video. By mapping the frame difference curve into the hash code, the difference between frames can be intuitively embodied in a digital mode, and the frame-to-frame comparison process is simplified.

In some embodiments, the above step 5013 is implemented by the following code w=getarr (s, hash) (L).

Wherein w is used to represent a hash code; getarr () is used to represent a hash function that can map a frame difference curve into a plurality of fixed-length hash codes, each corresponding to a video frame in the video, being a unique identifier for that video frame; s is a function representing a frame difference curve; the hash is used for representing the type of hash algorithm adopted by the hash function; l is used to represent the length of the hash code.

In the embodiment of the present application, the hash algorithm adopted in the above-mentioned process is any hash algorithm, which is not limited in the embodiment of the present application.

In some embodiments, hash= 'hash', meaning that the frame difference curve is mapped to a hash code using a hash manner represented by "hash".

The process of obtaining the hash code of the video frame in the video to be aligned based on the frame difference curve of the video to be aligned is the same as the process of obtaining the hash code of the video frame in the reference video, and will not be described herein.

5014. The server normalizes the hash codes of the video frames in the reference video and the hash codes of the video frames in the video to be aligned to obtain the hash codes of the video frames in the normalized reference video and the hash codes of the video frames in the video to be aligned.

In the embodiment of the application, taking the hash code of the video frame in the reference video as an example, the normalization process of the hash code is described. In order to make hash codes of different video frames in the same reference video have comparability, thereby determining hash codes with mutation, carrying out normalization processing on the hash codes, and facilitating comparison, wherein the hash codes after normalization processing have the same value range and scale.

In some embodiments, the step of normalizing the hash codes of the video frames in the reference video includes: dividing the hash code of each video frame in the reference video by the sum of the hash codes of all video frames in the video to obtain a normalized intermediate value of each video frame, and carrying out convolution processing on the normalized intermediate value of each video frame to obtain the normalized hash code of each video frame.

In some embodiments, the normalization method described above is implemented by the following code y=np.

Wherein y is used for representing the normalized hash code; np.convolve () is a function in the NumPy library for convolving the hash code; w is used to represent the unnormalized hash code; sun is used to represent the sum of hash codes of all video frames in the video; s is used to represent the convolution kernel; mode= 'same' is used to indicate that the first and last values of the value sequence obtained by the np.convolve () function are not output, only the middle part of the value sequence is output, and the value sequence includes all normalized hash codes.

The process of normalizing the hash codes of the video frames in the video to be aligned to obtain the hash codes of the video frames in the normalized video to be aligned is the same as the process of normalizing the hash codes of the video frames in the reference video, and will not be described in detail herein.

5015. The server respectively determines hash codes with mutation from hash codes of video frames in the normalized reference video and hash codes of video frames in the video to be aligned, and respectively adds video frames corresponding to the hash codes with mutation as key frames into a key frame set of the reference video and a key frame set of the video to be aligned.

In the embodiment of the present application, taking the example of obtaining a keyframe set of a reference video by determining a hash code with a mutation from hash codes of video frames in a normalized reference video, the process of obtaining the keyframe set is described. The window size is set to 2N, where N is an integer greater than zero, indicating that the window is capable of framing the hash of 2N video frames in the reference video. Starting from the hash code of the last video frame of the reference video, taking out the hash code in the window every time the window moves by the step length of N to form an array, and obtaining a plurality of arrays containing the normalized hash code through the multiple times of window movement, wherein the arrays are used as the arrays where the hash code of the key frame of the reference video is located. For each key frame of the reference video, finding out an extreme point in the array where the normalized hash code corresponding to the key frame is located, namely, the hash code with mutation, and adding the video frame of the reference video corresponding to the extreme point as the key frame of the reference video to a key frame set of the reference video. Frame alignment is performed based on the keyframe set obtained by the above process, instead of performing frame alignment based on all video frames in the video, or using a large number of aligned video training frame alignment models, the amount of computation in the frame alignment process can be reduced.

The method for obtaining the extremum is to compare 2N normalized hash codes in the array, find out the maximum value and the minimum value of the normalized hash codes, and use the maximum value and the minimum value as extremum points, namely hash codes with mutation.

In some embodiments, the above process of obtaining a plurality of arrays containing normalized hash codes is implemented by the following code window_frame=y [ window-1:window+1 ].

The window_frame is used for representing the normalized hash code; the window is used to represent the window size, and is used to determine the number of key frames to be fetched, and at least one key frame is to be fetched from the video frames framed by the window with the window size.

In some embodiments, the above process of finding the extreme points in the array is implemented by the following code extrre_frame=np.

The extre_frame is used for representing an array formed by the found extreme points; np. Asary () is a function in the NumPy library for converting input data into an array; argrelexetrema () is a function for finding an extreme point; window_frame is used to represent the array where the normalized hash code of the key frame is located; greateur is a function in the NumPy library that compares whether elements of two arrays are larger in one array than corresponding elements of the other array.

The above-mentioned process of determining the hash code with mutation from the hash codes of the video frames in the normalized video to be aligned, and the process of obtaining the keyframe set of the video to be aligned is the same as the process of obtaining the keyframe set of the reference video, and will not be described herein.

The above steps 5011 to 5015 are described by taking a reference video and a frame difference of a video to be aligned as an example, and in some embodiments, a key frame in the key frame set may be determined based on a type of a video frame, for example, an I frame in a video is determined as a key frame of the video, which is not limited by the embodiment of the present application. In some embodiments, if the number of video frames in the video is less than or equal to the frame number threshold, that is, if there are fewer video frames, all video frames in the video may be used as key frames, and the following steps 502 to 506 may be continued.

502. The server respectively acquires a similarity array of the reference video and a similarity array of the video to be aligned based on a key frame set of the reference video and a key frame set of the video to be aligned, wherein the similarity array comprises the similarity of a plurality of key frames relative to the same key frame.

In the embodiment of the present application, a process of acquiring a similarity array of a reference video based on a keyframe set of the reference video is taken as an example, and the process of acquiring the similarity array is described. And taking any key frame in the key frame set of the reference video as a target key frame, carrying out template matching on each key frame in the key frame set of the reference video and the target key frame to obtain the similarity between each key frame in the key frame set of the reference video and the target key frame, and adding the similarity into a similarity array of the reference video.

The process of obtaining the similarity is realized based on a template matching algorithm, wherein the template matching algorithm can adopt an SQDIFF_NORMED standard square difference matching mode, a CCORR_NORMED standard correlation matching mode and a CCOEFF_NORMED standard correlation coefficient matching mode.

In the embodiment of the application, the similarity between each key frame and the target key frame in the key frame set of the reference video is obtained by using a CCOEFF_NORMED standard correlation coefficient matching mode, and the similarity between each key frame and the target key frame in the key frame set of the reference video is calculated by using variance and correlation coefficient in the mode, so that the calculation is more complex and the obtained similarity is more accurate compared with other modes. The following formula (1) subtracts the relative value of the template mean value from the relative value of the image mean value to obtain the similarity between the key frame of the reference video and the target key frame:

（1）

wherein, R is used to represent the similarity between the key frame of the reference video and the target key frame, the value is between-1 and 1, 1 represents perfect matching, 0 represents no correlation, -1 represents bad matching, i.e. the results of different channels of the image are contrary, and the similarity between video frames in the same video is between 0 and 1; t is used to represent a target key frame; i is used to represent a keyframe in a keyframe set of a reference video; w is used to represent the width of the target keyframe; h is used to represent the high of the target keyframe; (x, y) is used to represent coordinates of the top left vertex of the target keyframe after overlaying the target keyframe onto the keyframe; (x ', y') is used to represent the coordinates of the lower right vertex of the target keyframe after overlaying the target keyframe onto the keyframe.

The process of obtaining the similarity array of the video to be aligned based on the key frame set of the video to be aligned is also the process of obtaining the similarity array of the video to be aligned based on the target key frame set, and is the same as the process of obtaining the similarity array of the reference video, and will not be described herein.

The above template matching process is described by taking the same size of the key frame and the target key frame as an example, in some embodiments, if the key frame and the target key frame are different in size, the key frame with smaller size is taken as the target key frame, each pixel in the key frame is taken as the center, the similarity between part of pixels in the key frame and the target key frame is calculated, and the maximum similarity is taken as the similarity between the key frame and the target key frame.

The step 502 is described by taking the example of obtaining the similarity through the template matching algorithm, and in some embodiments, the similarity may also be obtained by using a structure matching method or a sift feature matching method.

503. The server obtains a similarity curve of the reference video and a similarity curve of the video to be aligned respectively based on the similarity array of the reference video and the similarity array of the video to be aligned.

In the embodiment of the present application, a process of acquiring a similarity curve of a reference video based on a similarity array of the reference video is taken as an example, and the process of acquiring the similarity curve is described.

The similarity array of the reference video is expressed as a point in a two-dimensional coordinate, interpolation is carried out on the similarity array of the reference video, parameters of an interpolation function are obtained, a similarity curve of the reference video is obtained based on the parameters of the interpolation function, the similarity curve obtained in the process can obtain the change condition of the similarity of the key frames of the two videos relative to the key frame of the same target, and therefore partial fragments with the same change condition of the key frames in the two videos are reflected, and frame alignment is achieved.

In some embodiments, the above-described process of representing the similarity array of the reference video as a point in two-dimensional coordinates is implemented by the following code y=sim_arr, x=np.arange (0, len (y)). Wherein y is the ordinate, used to represent similarity; x is the abscissa, which is used to indicate the ordering of the keyframes corresponding to the similarity among the plurality of keyframes.

In some embodiments, the above-mentioned interpolation is performed on the similarity array of the reference video, and the process of obtaining the parameters of the interpolation function is implemented by the following code tck=clip. The skip.interpolation.splrep () is a splrep function in the skip.interpolation library for obtaining parameters of the interpolation function; x and y are used to represent the abscissa and ordinate, respectively, of the similarity array of the reference video; s=0 indicates that interpolation is performed using the least square method, and smoothing is not performed on the obtained parameter.

In some embodiments, the above process of obtaining the similarity curve of the reference video based on the parameters of the interpolation function is implemented by the following codes xnew=np.area (0, len (y)), ynew=scipy.interface.splev (xnew, tck, der=0): obtaining an integer array identical to the integer array through a code xnew=np.area (0, len (y)) and used for representing the ordering of the key frames corresponding to the similarity in a plurality of key frames; obtaining a similarity curve of a reference video through a code ynew=scipy.interface.splev (xnew, tck, der=0), wherein ynew is a function of the similarity curve of the reference video; the skip.interface.splev () is a splev function in the skip.interface library for obtaining a smooth curve; xnew is the abscissa of the similarity of the key frames, and is used for representing the ordering of the key frames corresponding to the similarity in the plurality of key frames; tck is used to represent the parameters of the interpolation function; der=0 indicates that the function of the similarity curve of the acquired reference video is not derivative.

The process of obtaining the similarity curve of the reference video and the similarity curve of the video to be aligned based on the similarity array of the video to be aligned is the same as the process of obtaining the similarity curve of the reference video, and will not be described herein.

The step 503 is described by taking a spline interpolation method to obtain a similarity curve as an example, and in some embodiments, the similarity curve may be obtained by using a cubic spline interpolation method.

504. The server acquires a plurality of periods of the reference video and a plurality of periods of the video to be aligned respectively based on the similarity curve of the reference video and the similarity curve of the video to be aligned.

The step 504 includes steps 5041 to 5042.

5041. The server calculates a peak value protrusion value of the similarity curve of the reference video and a peak value protrusion value of the similarity curve of the video to be aligned, based on the maximum value and the minimum value of the similarity curve of the reference video and the maximum value and the minimum value of the similarity curve of the video to be aligned, respectively, wherein the peak value protrusion value is used for representing the protrusion degree of the peak value of the similarity curve.

In the embodiment of the present application, a process of calculating a peak protrusion value of a similarity curve of a reference video based on a maximum value and a minimum value of the similarity curve of the reference video is described as an example. Subtracting the minimum value from the maximum value of the similarity curve of the reference video to obtain a difference value, and multiplying the difference value by a ratio to obtain the peak salient value of the reference video.

In some embodiments, the above process of obtaining the peak prominence is implemented by the following code prominince= (max (ynew) -min (ynew)) ×r, where prominince is used to represent the peak prominence of the similarity curve; max (ynew) and min (ynew) are used to represent the maximum and minimum values of the similarity curve, respectively; r is used to represent the ratio.

The process of calculating the peak value protrusion of the similarity curve of the video to be aligned based on the maximum value and the minimum value of the similarity curve of the video to be aligned is the same as the process of calculating the peak value protrusion of the reference video, and will not be described herein.

5042. The server determines a plurality of periods of the reference video and a plurality of periods of the video to be aligned based on the peak prominence of the similarity curve of the reference video and the peak prominence of the similarity curve of the video to be aligned.

In the embodiment of the present application, a plurality of periods of a reference video are determined based on peak prominence values of a similarity curve of the reference video. And screening out peaks smaller than the peak value salient value of the reference video in the similarity curve of the reference video, obtaining peaks larger than or equal to the peak value salient value of the reference video in the similarity curve of the reference video, taking the lowest trough between two adjacent peaks in the peaks larger than or equal to the peak value salient value of the reference video in the similarity curve of the reference video as a starting frame of the reference video, and taking the similarity curve between the starting frames of two adjacent reference videos as a period of the reference video.

For example, there are three peaks in the similarity curve of the reference video, which are A, B, C respectively, wherein the lowest trough between a and B is D, the lowest trough between B and C is E, D and E are taken as start frames, one period from the start of the reference video to D, one period from D to E, and one period from E to the end of the reference video.

In some embodiments, the above-mentioned process of acquiring the multiple periods of the reference video may be implemented by the following codes peaks, properties=find_peaks (ynew, property=property), where peaks are used to represent peaks in the similarity curve that are greater than or equal to the peak of the peak salience value; properties are used to represent the above start frame; find_peaks () is a function in the scipy.signal library to find peaks in the similarity curve that are greater than or equal to the peak saliency value; ynew is a function of the similarity curve; prominine = prominine is used to represent the peak prominence of the similarity curve.

The process of determining the multiple periods of the video to be aligned based on the peak salient values of the similarity curves of the video to be aligned is the same as the process of determining the multiple periods of the reference video, and will not be described herein.

The above step 504 is described by taking a plurality of periods of capturing video based on peak saliency values as an example, in some embodiments, for a video composed of cyclic segments, the function of the similarity curve of the video may be transformed into a fourier function, that is, a sum of a series of sine functions and cosine functions, the period of the fourier function is captured based on the fourier function of the similarity curve, the period of the similarity curve is captured based on the period of the fourier function, and the period of the video is captured based on the period of the similarity curve.

505. The server compares the multiple periods of the reference video with the multiple periods of the video to be aligned, and outputs the multiple periods of the reference video and the multiple periods of the video to be aligned as a second period and a first period respectively.

Comparing the multiple periods of the reference video with the multiple periods of the video to be aligned, if the peak values of all wave peaks in the two periods are the same in size and sequence, the similarity change conditions of the two periods are the same, and outputting the period belonging to the reference video as a second period and outputting the period belonging to the video to be aligned as a first period in the two periods with the same similarity change conditions.

Taking a comparison process of one period in the reference video as an example, comparing the peak values and the appearance sequence of all the peaks in the period with one period in the video to be aligned, if the peak values and the appearance sequence of all the peaks in the two compared periods are identical, respectively outputting the two periods as a first period and a second period, and if the peak values and the appearance sequence of all the peaks in the two compared periods are not identical, comparing the period in the reference video with the next period in the video to be aligned.

The above process can find out the same period of the similarity change condition in the two similarity curves by comparing the periods of the reference video with the periods of the video to be aligned, and the same period of the similarity change condition can reflect the same segment in the two videos, thereby aligning the two videos. In addition, frame alignment is achieved by comparing two similarity curves, rather than comparing the difference between each frame, so that the amount of computation consumed in searching the same segment in two videos can be reduced.

506. The server aligns the reference video and the video to be aligned based on the first period and the second period.

In some embodiments, the above alignment process may also be implemented by aligning the start frames, i.e. the start frames of the first period and the second period, so that the two videos start playing at the same time in the start frames, thereby implementing the alignment of the two videos. The alignment process can also align all video frames in two periods one by one according to the sequence, so that the aligned video frames in the first period and the second period can be played at the same time, and the alignment of the two videos can be realized. In addition, the two videos may be aligned based on the first period and the second period in other various manners, which will not be described herein.

The frame alignment method provided by the embodiment of the application can be applied to various scenes, for example: when detecting game animation, the modification of the animation resources of each version may affect the stock resources, so that the frame alignment comparison can be performed on the animation videos of a plurality of versions by recording the animation videos of different versions, thereby finding out the bug when the versions iterate; an automatic daily iteration smoking test can be performed; along with the rapid development of short videos, the requirements for video synthesis are also increased, two shots in the same scene can be used for splicing the videos through frame alignment, so that the starting time of two fragments at the splicing position is ensured to be consistent, and the contents of the two videos are more consistent in time and space so as to display consistent contents; when watching long video online, the long video may be segmented into different segments and stored on different servers, the second video playing needs to jump to the same viewing position, key frame scenes can be recorded when the user closes the video, video frame alignment is performed when the user pulls the stream for the second time, and smooth playing effect can be achieved.

According to the frame alignment method provided by the embodiment of the application, in the frame alignment process, the similarity curve of the reference video and the similarity curve of the video to be aligned can represent the change condition of the key frames in the video relative to the same key frame, and because the key frames can represent the content of the video, the similarity change curves of the two videos can accurately reflect the same fragments of the two videos regardless of the long video or the short video, namely the same fragments of the two videos can be found, and the starting frames of the two videos are found, so that the frame alignment is realized, and in the process, all frames in the videos are not matched, so that the calculation time and the performance consumption of an overall algorithm are reduced, and therefore, the calculation amount in the frame alignment process is greatly reduced while the accuracy of the frame alignment is ensured.

In addition, the method can extract the key frame set of each video, and draw the similarity curve of the video based on the key frame set, so that the frame alignment of the two videos is realized, and therefore, the method has strong universality and is suitable for various videos.

In addition, in the method, the method for acquiring the key frame set can be used independently for displaying the main content and the characteristics of the video, improving the accuracy and the efficiency of video retrieval, removing redundant frames and repeated frames of the video, and reducing the storage space and the transmission bandwidth of the video. The method for obtaining the similarity curve in the method can be independently used for reflecting the change condition of the video on a time axis and scoring the change degree of the video.

Fig. 7 is a block diagram of a frame alignment apparatus according to an embodiment of the present application. The apparatus is configured to perform the steps when the frame alignment method is performed, and referring to fig. 7, the frame alignment apparatus includes:

the similarity curve obtaining module 701 is configured to obtain a similarity curve of the reference video and a similarity curve of the video to be aligned based on the set of key frames of the reference video and the set of key frames of the video to be aligned, respectively, where the similarity curve of the reference video is used to indicate a change condition of similarity between a plurality of key frames of the reference video and a target key frame of the reference video, and the similarity curve of the video to be aligned is used to indicate a change condition of similarity between a plurality of key frames of the video to be aligned and a target key frame of the reference video;

the period obtaining module 702 is configured to determine a first period from a similarity curve of the videos to be aligned, where the first period is the same period as a similarity change condition in a second period in the similarity curve of the reference video;

an alignment module 703, configured to align the reference video and the video to be aligned based on the first period and the second period.

In some embodiments, the similarity curve acquisition module 701 includes:

The key frame set acquisition unit is used for respectively acquiring a key frame set of the reference video and a key frame set of the video to be aligned;

the similarity array acquisition unit is used for taking any key frame in the key frame set of the reference video as a target key frame, calculating the similarity between each key frame in the key frame set of the reference video and the target key frame to obtain a similarity array of the reference video, and calculating the similarity between each key frame in the key frame set of the video to be aligned and the target key frame to obtain a similarity array of the video to be aligned;

the similarity curve acquisition unit is used for respectively acquiring the similarity curve of the reference video and the similarity curve of the video to be aligned based on the similarity array of the reference video and the similarity array of the video to be aligned.

In some embodiments, the similarity curve acquisition unit is configured to:

and respectively interpolating the similarity array of the reference video and the similarity array of the video to be aligned, and obtaining a similarity curve of the reference video and a similarity curve of the video to be aligned based on the similarity array of the reference video and the similarity array of the video to be aligned after interpolation.

In some embodiments, the key frame set acquisition unit includes:

The frame difference array acquisition subunit is used for respectively acquiring a frame difference array of the reference video and a frame difference array of the video to be aligned based on all video frames of the reference video and all video frames of the video to be aligned, wherein the frame difference in the frame difference array represents the difference between each video frame and the previous video frame;

the frame difference curve acquisition subunit is used for respectively acquiring a frame difference curve of the reference video and a frame difference curve of the video to be aligned based on the frame difference array of the reference video and the frame difference array of the video to be aligned;

the hash code acquisition subunit is used for respectively acquiring hash codes of video frames in the reference video and hash codes of video frames in the video to be aligned based on the frame difference curve of the reference video and the frame difference curve of the video to be aligned;

the key frame acquisition subunit is used for respectively determining hash codes with mutation from hash codes of video frames in the reference video and hash codes of video frames in the video to be aligned, and respectively adding the video frames corresponding to the hash codes with mutation as key frames into a key frame set of the reference video and a key frame set of the video to be aligned.

In some embodiments, the frame difference curve acquisition subunit is configured to:

In some embodiments, the period acquisition module 702 includes:

a peak salient value obtaining unit, configured to calculate a peak salient value of a similarity curve of the reference video and a peak salient value of a similarity curve of the video to be aligned, respectively, based on a maximum value and a minimum value of the similarity curve of the reference video and a maximum value and a minimum value of the similarity curve of the video to be aligned, where the peak salient value is used to represent a salient degree of a peak of the similarity curve;

a period acquisition unit configured to determine a plurality of periods of the reference video and a plurality of periods of the video to be aligned based on a peak protrusion value of a similarity curve of the reference video and a peak protrusion value of a similarity curve of the video to be aligned;

and the comparison unit is used for comparing the plurality of periods of the reference video with the plurality of periods of the video to be aligned and outputting the periods with the same similarity change condition as a second period and a first period respectively.

In some embodiments, the period acquisition unit is to:

It should be noted that: in the frame alignment device provided in the above embodiment, only the division of the above functional modules is used for illustration, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus and the method embodiments provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the apparatus and the method embodiments are detailed in the method embodiments and are not repeated herein.

Embodiments of the present application also provide a computer-readable storage medium having stored therein at least one section of a computer program that is loaded and executed by a processor of a computer device to implement the operations performed by the computer device in the methods of the embodiments described above. For example, the computer readable storage medium may be a ROM (Read-Only Memory), a RAM (Random Access Memory ), a CD-ROM (Compact Disc Read-Only Memory), a magnetic tape, a floppy disk, an optical data storage device, and the like.

In some embodiments, a computer program according to an embodiment of the present application may be deployed to be executed on one computer device or on multiple computer devices located at one site, or on multiple computer devices distributed across multiple sites and interconnected by a communication network, where the multiple computer devices distributed across multiple sites and interconnected by a communication network may constitute a blockchain system.

Embodiments of the present application also provide a computer program product or computer program comprising computer program code stored in a computer readable storage medium. The computer program code is read from a computer readable storage medium by a processor of a server, which executes the computer program code, causing the server to perform the frame alignment methods provided in the various alternative implementations described above.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The foregoing description of the preferred embodiments of the present application is not intended to limit the application, but rather, the application is to be construed as limited to the appended claims.

Claims

1. A method of frame alignment, the method comprising:

based on a key frame set of a reference video and a key frame set of a video to be aligned, respectively obtaining a similarity curve of the reference video and a similarity curve of the video to be aligned, wherein the similarity curve of the reference video is used for indicating the change condition of the similarity between a plurality of key frames of the reference video and a target key frame of the reference video, and the similarity curve of the video to be aligned is used for indicating the change condition of the similarity between the plurality of key frames of the video to be aligned and the target key frame of the reference video;

Determining a first period from the similarity curve of the videos to be aligned, wherein the first period is the same as the similarity change condition in a second period in the similarity curve of the reference video;

aligning the reference video and the video to be aligned based on the first period and the second period;

wherein, the determining a first period from the similarity curves of the videos to be aligned, where the first period is the same period as the similarity change condition in a second period in the similarity curves of the reference videos includes: respectively calculating a peak salient value of the similarity curve of the reference video and a peak salient value of the similarity curve of the video to be aligned based on the maximum value and the minimum value of the similarity curve of the reference video and the maximum value and the minimum value of the similarity curve of the video to be aligned, wherein the peak salient value is used for representing the salient degree of the peak of the similarity curve; determining a plurality of periods of the reference video and a plurality of periods of the video to be aligned based on the peak prominence of the similarity curve of the reference video and the peak prominence of the similarity curve of the video to be aligned; comparing the multiple periods of the reference video with the multiple periods of the video to be aligned, and outputting the multiple periods of the reference video and the multiple periods of the video to be aligned as the second period and the first period respectively.

2. The method of claim 1, wherein the obtaining the similarity curve of the reference video and the similarity curve of the video to be aligned based on the set of key frames of the reference video and the set of key frames of the video to be aligned, respectively, comprises:

respectively acquiring a key frame set of the reference video and a key frame set of the video to be aligned;

taking any key frame in the key frame set of the reference video as a target key frame, calculating the similarity between each key frame in the key frame set of the reference video and the target key frame to obtain a similarity array of the reference video, and calculating the similarity between each key frame in the key frame set of the video to be aligned and the target key frame to obtain a similarity array of the video to be aligned;

3. The method of claim 2, wherein the obtaining the similarity curve of the reference video and the similarity curve of the video to be aligned based on the similarity array of the reference video and the similarity array of the video to be aligned, respectively, comprises:

And respectively interpolating the similarity array of the reference video and the similarity array of the video to be aligned, and obtaining a similarity curve of the reference video and a similarity curve of the video to be aligned based on the interpolated similarity array of the reference video and the similarity array of the video to be aligned.

4. The method of claim 2, wherein the separately obtaining the set of keyframes of the reference video and the set of keyframes of the video to be aligned comprises:

based on the frame difference curve of the reference video and the frame difference curve of the video to be aligned, respectively obtaining the hash code of the video frame in the reference video and the hash code of the video frame in the video to be aligned;

And respectively determining hash codes with mutation from the hash codes of the video frames in the reference video and the hash codes of the video frames in the video to be aligned, and respectively adding the video frames corresponding to the hash codes with mutation as key frames into a key frame set of the reference video and a key frame set of the video to be aligned.

5. The method of claim 4, wherein the obtaining the frame difference curve of the reference video and the frame difference curve of the video to be aligned based on the frame difference array of the reference video and the frame difference array of the video to be aligned, respectively, comprises:

6. The method of claim 1, wherein the determining the plurality of periods of the reference video and the plurality of periods of the video to be aligned based on the peak prominence of the similarity curve of the reference video and the peak prominence of the similarity curve of the video to be aligned comprises:

Respectively screening out a peak smaller than the peak salient value of the reference video in the similarity curve of the reference video and a peak smaller than the peak salient value of the video to be aligned in the similarity curve of the video to be aligned to obtain a peak larger than or equal to the peak salient value of the reference video in the similarity curve of the reference video and a peak larger than or equal to the peak salient value of the video to be aligned in the similarity curve of the video to be aligned;

taking the lowest trough between two adjacent peaks in the peaks which are larger than or equal to the peak value salient value of the reference video in the similarity curve of the reference video as a starting frame of the reference video, and taking the similarity curve between the starting frames of two adjacent reference videos as the period of the reference video;

and taking the lowest trough between two adjacent peaks of the peaks which are larger than or equal to the peak value salient value of the video to be aligned in the similarity curve of the video to be aligned as a starting frame of the video to be aligned, and taking the similarity curve between the starting frames of two adjacent videos to be aligned as the period of the video to be aligned.

7. A frame alignment apparatus, the apparatus comprising:

The similarity curve acquisition module is used for respectively acquiring a similarity curve of the reference video and a similarity curve of the video to be aligned based on a key frame set of the reference video and a key frame set of the video to be aligned, wherein the similarity curve of the reference video is used for indicating the change condition of the similarity between a plurality of key frames of the reference video and a target key frame of the reference video, and the similarity curve of the video to be aligned is used for indicating the change condition of the similarity between the plurality of key frames of the video to be aligned and the target key frame of the reference video;

the period acquisition module is used for determining a first period from the similarity curve of the videos to be aligned, wherein the first period is the same as the similarity change condition in a second period in the similarity curve of the reference video;

an alignment module for aligning the reference video and the video to be aligned based on the first period and the second period;

wherein, the cycle acquisition module includes: a peak value salient value acquisition unit, a period acquisition unit and a comparison unit; the peak salient value obtaining unit is used for respectively calculating a peak salient value of the similarity curve of the reference video and a peak salient value of the similarity curve of the video to be aligned based on the maximum value and the minimum value of the similarity curve of the reference video and the maximum value and the minimum value of the similarity curve of the video to be aligned, wherein the peak salient value is used for representing the salient degree of the peak of the similarity curve; the period acquisition unit is used for determining a plurality of periods of the reference video and a plurality of periods of the video to be aligned based on the peak salient value of the similarity curve of the reference video and the peak salient value of the similarity curve of the video to be aligned; the comparison unit is used for comparing the periods of the reference video with the periods of the video to be aligned, and outputting the periods of the reference video and the periods of the video to be aligned, wherein the periods with the same similarity change condition are respectively output as the second period and the first period.

8. The apparatus of claim 7, wherein the similarity curve acquisition module comprises:

a key frame set acquisition unit, configured to acquire a key frame set of the reference video and a key frame set of the video to be aligned respectively;

a similarity array obtaining unit, configured to calculate a similarity between each key frame in the key frame set of the reference video and the target key frame by using any key frame in the key frame set of the reference video as a target key frame, to obtain a similarity array of the reference video, and calculate a similarity between each key frame in the key frame set of the video to be aligned and the target key frame, to obtain a similarity array of the video to be aligned;

9. The apparatus according to claim 8, wherein the similarity curve acquisition unit is configured to:

10. The apparatus according to claim 8, wherein the key frame set acquisition unit includes:

a frame difference array obtaining subunit, configured to obtain, based on all video frames of the reference video and all video frames of the video to be aligned, a frame difference array of the reference video and a frame difference array of the video to be aligned, respectively, where a frame difference in the frame difference array represents a difference between each video frame and a previous video frame;

a frame difference curve obtaining subunit, configured to obtain a frame difference curve of the reference video and a frame difference curve of the video to be aligned respectively based on the frame difference array of the reference video and the frame difference array of the video to be aligned;

the hash code acquisition subunit is used for respectively acquiring the hash codes of the video frames in the reference video and the hash codes of the video frames in the video to be aligned based on the frame difference curve of the reference video and the frame difference curve of the video to be aligned;

the key frame acquisition subunit is configured to determine, from the hash codes of the video frames in the reference video and the hash codes of the video frames in the video to be aligned, hash codes with abrupt changes, and add the video frames corresponding to the hash codes with abrupt changes as key frames to the key frame set of the reference video and the key frame set of the video to be aligned, respectively.

11. The apparatus of claim 10, wherein the frame difference curve acquisition subunit is configured to:

12. The apparatus of claim 7, wherein the period acquisition unit is configured to:

13. A computer device, characterized in that it comprises a processor and a memory for storing at least one computer program, which is loaded by the processor and which carries out the method according to any of claims 1 to 6.

14. A computer readable storage medium, characterized in that the computer readable storage medium is adapted to store at least one computer program adapted to perform the method of any of claims 1 to 6.