CN112118494A

CN112118494A - Video data processing method and device and storage medium

Info

Publication number: CN112118494A
Application number: CN201910537078.XA
Authority: CN
Inventors: 张全鹏
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-06-20
Filing date: 2019-06-20
Publication date: 2020-12-22
Anticipated expiration: 2039-06-20
Also published as: CN112118494B

Abstract

The embodiment of the application discloses a video data processing method, a video data processing device and a storage medium, wherein the method comprises the following steps: clustering video frames in the first video sequence to obtain a cluster associated with the first video sequence, and acquiring key video frames from the cluster; the number of the key video frames is the same as that of the clustering clusters; determining a second video sequence based on the key video frames; determining a time interval between two adjacent key video frames in the second video sequence according to the playing time stamps of the key video frames in the first video sequence; and playing the second video sequence based on the time interval between two adjacent key video frames. By adopting the embodiment of the application, the system memory occupied by the video data can be reduced, and the display effect of the video data can be improved.

Description

Video data processing method and device and storage medium

Technical Field

The present application relates to the field of internet technologies, and in particular, to a method and an apparatus for processing video data, and a storage medium.

Background

In some video data composed of a frame sequence, in order to relieve the memory pressure caused by loading the video data by a terminal, frame reduction processing may be performed according to the parity of the frame number of the video data, for example, taking the video data as frame animation data, a video frame with an even frame number (i.e., an even frame) or a video frame with an odd frame number (i.e., an odd frame) may be selected from the frame animation data to determine a new frame animation data. In other words, in the technical solution of performing frame reduction processing according to the parity of the frame number, some video frames with a large variation trend may be removed in the frame reduction processing process, so that when the frame animation data obtained after the frame reduction processing is played on a client or a web page, a more abrupt visual effect may be generated, so as to reduce the display effect of the video data.

Content of application

The embodiment of the application provides a video data processing method, a video data processing device and a storage medium, which can reduce system memory occupied by video data and improve the display effect of the video data.

An aspect of an embodiment of the present application provides a method for processing video data, where the method includes:

clustering video frames in a first video sequence to obtain a cluster associated with the first video sequence, and acquiring key video frames in the cluster; the number of the key video frames is the same as that of the clustering clusters;

determining a second video sequence based on the key video frames;

determining a time interval between two adjacent key video frames in the second video sequence according to the playing time stamps of the key video frames in the first video sequence;

and playing the second video sequence based on the time interval between the two adjacent key video frames.

Clustering video frames in the first video sequence to obtain a cluster associated with the first video sequence, and acquiring a key video frame from the cluster; the number of key video frames is the same as the number of cluster clusters, including:

acquiring a first video sequence, and converting an initial color space associated with video frames in the first video sequence into a target color space;

in the target color space, performing clustering processing on video frames in the first video sequence to obtain a cluster associated with the first video sequence;

and taking the video frame matched with the key frame acquisition condition as a key video frame in the clustering cluster.

Wherein, in the target color space, performing clustering processing on the video frames in the first video sequence to obtain a cluster associated with the first video sequence includes:

acquiring a first video frame serving as a clustering centroid from the first video sequence;

determining video frames except the first video frame in the first video sequence as second video frames, and sequentially acquiring the second video frames based on a polling mechanism;

in the target color space, dividing cluster clusters to which the video frames in the first video sequence belong according to the color similarity between the first video frame and the second video frame.

Wherein, in the target color space, dividing the cluster to which the video frames in the first video sequence belong according to the color similarity between the first video frame and the second video frame includes:

creating a cluster to which the first video frame belongs;

performing color similarity matching on the first video frame and the second video frame in the target color space;

if the matched color similarity between the first video frame and the second video frame is larger than or equal to a clustering threshold, dividing the second video frame of which the color similarity is larger than or equal to the clustering threshold into a clustering cluster to which the first video frame belongs;

if the matched color similarity between the first video frame and the second video frame is smaller than the clustering threshold, updating the first video frame based on the second video frame with the color similarity smaller than the clustering threshold, creating a clustering cluster to which the updated first video frame belongs, and sequentially performing color similarity matching on the updated first video frame and unmatched second video frames until all the video frames in the first video sequence are matched in color similarity, and outputting the clustering cluster to which the video frame in the first video sequence belongs.

Wherein said color similarity matching the first video frame with the second video frame in the target color space comprises:

determining a color histogram of the first video frame in the target color space as a first histogram and determining a color histogram of the second video frame in the target color space as a second histogram; the target color space contains a plurality of color components;

determining a similarity between the first histogram and the second histogram based on the statistical probability value associated with each color component in the first histogram and the statistical probability value associated with each color component in the second histogram;

determining a similarity between the first histogram and the second histogram as a color similarity between the first video frame and the second video frame.

Wherein the determining a similarity between the first histogram and the second histogram based on the statistical probability value associated with each color component in the first histogram and the statistical probability value associated with each color component in the second histogram comprises:

acquiring a target color component from each color component in the first histogram; the target color components are collectively represented by a plurality of index quantities in the target color space;

respectively determining the statistical probability value of the target color component on each index parameter as a first statistical probability value associated with each index parameter in the first histogram, and respectively determining the statistical probability value of the target color component on each index parameter as a second statistical probability value associated with each index parameter in the second histogram;

carrying out numerical comparison on the first statistical probability value associated with each index parameter and the second statistical probability value associated with the same index parameter, and determining the minimum probability statistical value corresponding to each index parameter according to the numerical comparison result;

and determining a minimum cumulative probability value corresponding to the target color component based on the minimum probability statistic value corresponding to each index parameter, and determining the similarity between the first histogram and the second histogram based on the minimum cumulative probability value corresponding to the target color component.

Wherein, the taking the video frame matched with the key frame acquisition condition as the key video frame in the cluster comprises:

in the target color space, determining information entropy corresponding to the video frames in the cluster based on the accumulated probability value corresponding to each color component carried by the video frames in the cluster;

searching the video frame with the maximum information entropy in the information entropy corresponding to the video frames in the cluster;

and taking the searched video frame with the maximum information entropy as a key video frame obtained from the clustering cluster.

Wherein, in the target color space, determining the information entropy of the video frames in the cluster based on the respective corresponding cumulative probability values of each color component carried by the video frames in the cluster, includes:

acquiring an index parameter of each color component in the target color space;

acquiring a statistical probability value of the video frame in the cluster on the index parameter of each color component, and accumulating the statistical probability value on the index parameter of each color component to obtain an accumulated probability value corresponding to each color component;

and determining the information entropy of the video frames in the cluster based on the accumulated probability value corresponding to each color component and the weight value corresponding to the corresponding color component.

Wherein the two adjacent key video frames comprise a first key video frame and a second key video frame;

the playing the second video sequence based on the time interval between the two adjacent key video frames comprises:

controlling the playing duration of the first key video frame based on the time interval between the first key video frame and the second key video frame;

and playing the first key video frame based on the playing duration of the first key video frame until the playing progress of the second video sequence reaches the playing time stamp of the second key video frame, and playing the second key video frame.

An aspect of an embodiment of the present application provides an image video data processing apparatus, where the apparatus includes:

the clustering module is used for clustering video frames in the first video sequence to obtain a cluster associated with the first video sequence, and acquiring key video frames from the cluster; the number of the key video frames is the same as that of the clustering clusters;

a sequence determination module to determine a second video sequence based on the key video frames;

an interval determining module, configured to determine a time interval between two adjacent key video frames in the second video sequence according to the playing time stamps of the key video frames in the first video sequence;

and the playing module is used for playing the second video sequence based on the time interval between the two adjacent key video frames.

Wherein the clustering module comprises:

the video processing device comprises a space conversion unit, a color space conversion unit and a color space conversion unit, wherein the space conversion unit is used for acquiring a first video sequence and converting an initial color space associated with video frames in the first video sequence into a target color space;

a clustering unit, configured to perform clustering processing on video frames in the first video sequence in the target color space to obtain a cluster associated with the first video sequence;

and the key frame acquisition unit is used for taking the video frames matched with the key frame acquisition conditions as key video frames in the clustering cluster.

Wherein the clustering unit includes:

a centroid determining subunit, configured to obtain a first video frame serving as a clustering centroid from the first video sequence;

the polling subunit is used for determining video frames except the first video frame in the first video sequence as second video frames and sequentially acquiring the second video frames based on a polling mechanism;

and the dividing subunit is used for dividing the cluster to which the video frame in the first video sequence belongs according to the color similarity between the first video frame and the second video frame in the target color space.

Wherein the molecular dividing unit comprises:

a cluster creating subunit, configured to create a cluster to which the first video frame belongs;

a matching subunit, configured to perform color similarity matching on the first video frame and the second video frame in the target color space;

a first dividing unit, configured to divide the second video frame of which the color similarity is greater than or equal to a clustering threshold into a clustering cluster to which the first video frame belongs if the color similarity between the first video frame and the second video frame is greater than or equal to the clustering threshold;

and the second dividing subunit is used for updating the first video frame based on the second video frame of which the color similarity is smaller than the clustering threshold value if the color similarity between the first video frame and the second video frame is smaller than the clustering threshold value, creating a clustering cluster to which the updated first video frame belongs, and performing color similarity matching on the updated first video frame and the unmatched second video frame in sequence until all the video frames in the first video sequence are matched in color similarity, and outputting the clustering cluster to which the video frame in the first video sequence belongs.

Wherein the matching subunit comprises:

a histogram determining subunit, configured to determine a color histogram of the first video frame in the target color space as a first histogram, and determine a color histogram of the second video frame in the target color space as a second histogram; the target color space contains a plurality of color components;

a probability statistics subunit, configured to determine a similarity between the first histogram and the second histogram based on the statistical probability value associated with each color component in the first histogram and the statistical probability value associated with each color component in the second histogram;

and the similarity determining subunit is used for determining the similarity between the first histogram and the second histogram as the color similarity between the first video frame and the second video frame.

Wherein the probability statistics subunit includes:

a component obtaining subunit, configured to obtain a target color component from each color component in the first histogram; the target color components are collectively represented by a plurality of index quantities in the target color space;

a probability value determining subunit, configured to determine, in the first histogram, the statistical probability value of the target color component on each index parameter as a first statistical probability value associated with each index parameter, and determine, in the second histogram, the statistical probability value of the target color component on each index parameter as a second statistical probability value associated with each index parameter;

the probability value comparison subunit is used for carrying out numerical comparison on the first statistical probability value associated with each index parameter and the second statistical probability value associated with the same index parameter, and determining the minimum probability statistical value corresponding to each index parameter according to the numerical comparison result;

and the probability value accumulation subunit is used for determining the minimum accumulated probability value corresponding to the target color component based on the minimum probability statistic value respectively corresponding to each index parameter, and determining the similarity between the first histogram and the second histogram based on the minimum accumulated probability value corresponding to the target color component.

Wherein the key frame acquiring unit includes:

an information entropy determining subunit, configured to determine, in the target color space, an information entropy corresponding to the video frame in the cluster based on respective cumulative probability values corresponding to each color component carried by the video frame in the cluster;

the information entropy searching subunit is used for searching the video frame with the maximum information entropy in the information entropy corresponding to the video frames in the clustering cluster;

and the key frame determining subunit is used for taking the searched video frame with the maximum information entropy as the key video frame acquired from the clustering cluster.

Wherein the information entropy determining subunit includes:

an index quantity obtaining subunit, configured to obtain an index parameter of each color component in the target color space;

the accumulation subunit is configured to acquire a statistical probability value of the video frame in the cluster on the index parameter of each color component, and accumulate the statistical probability value on the index parameter of each color component to obtain an accumulated probability value corresponding to each color component;

and the weighting subunit is used for determining the information entropy of the video frames in the cluster based on the accumulated probability value corresponding to each color component and the weight value corresponding to the corresponding color component.

the playing module comprises:

the time length control unit is used for controlling the playing time length of the first key video frame based on the time interval between the first key video frame and the second key video frame;

and the playing subunit is configured to play the first key video frame based on the playing duration of the first key video frame, and play the second key video frame until the playing progress of the second video sequence reaches the playing timestamp of the second key video frame.

An aspect of an embodiment of the present application provides a computer device, where the computer device includes: a processor, a memory, and a network interface;

the processor is connected with a memory and a network interface, wherein the network interface is used for providing a data communication function, the memory is used for storing program codes, and the processor is used for calling the program codes to execute the method in one aspect of the embodiment of the application.

An aspect of the embodiments of the present application provides a computer storage medium storing a computer program, where the computer program includes program instructions, and when the processor executes the program instructions, the method according to an aspect of the embodiments of the present application is performed.

In the embodiment of the application, the clustering processing is performed on the video frames in the first video sequence, so that the clustering cluster associated with the first video sequence can be obtained, and the second video sequence can be generated according to the key video frames extracted from the clustering cluster. It is to be understood that the second video sequence is a video sequence obtained by performing frame reduction processing on the first video sequence. Further, according to the playing time stamp of the key video frame in the first video sequence, the time interval between two adjacent key video frames in the second video sequence can be determined, so that the second video sequence can be played based on the time interval between two adjacent key video frames. Therefore, frame reduction processing is carried out on the first video sequence in a key frame clustering mode, the number of video frames in the second video sequence obtained after frame reduction processing can be effectively ensured to be less than that of the video frames in the first video sequence, and therefore when the second video sequence is played in terminal equipment, the system memory occupied by video data can be reduced; in addition, a representative video frame can be extracted from each clustering cluster in a key frame clustering mode to serve as a key video frame, so that the visual transition effect between any two adjacent key video frames in the second video sequence can be ensured as much as possible in the process of performing frame reduction processing according to the key video frames; in addition, through the time interval between two adjacent key video frames, the playing time of each video frame can be effectively controlled, and the display effect of the video data can be further improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic structural diagram of a network architecture according to an embodiment of the present application;

fig. 2 is a schematic view of a scene for acquiring a key video frame according to an embodiment of the present application;

fig. 3 is a schematic flowchart of a video data processing method according to an embodiment of the present application;

fig. 4a and fig. 4b are schematic diagrams of dividing cluster clusters associated with a first video sequence according to an embodiment of the present application;

fig. 5 is a schematic diagram of determining a second video sequence according to an embodiment of the present application;

fig. 6 is a schematic view of a scene for playing a second video sequence according to an embodiment of the present application;

fig. 7 is a schematic diagram of another video data processing method provided in the embodiment of the present application;

fig. 8 is a schematic structural diagram of a video data processing apparatus according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Please refer to fig. 1, which is a schematic structural diagram of a network architecture according to an embodiment of the present application. As shown in fig. 1, the network architecture may include a server 2000 and a user terminal cluster, where the user terminal cluster may include a plurality of user terminals, as shown in fig. 1, and specifically may include a user terminal 3000a, a user terminal 3000b, user terminals 3000c and …, and a user terminal 3000 n; as shown in fig. 1, the user terminals 3000a, 3000b, 3000c, …, and 3000n may be respectively in network connection with the server 2000, so that each user terminal may perform data interaction with the server 2000 through the network connection.

For convenience of understanding, in the embodiment of the present application, one user terminal may be selected as a target user terminal from the plurality of user terminals shown in fig. 1, where the target user terminal may include: smart terminals carrying video data processing functions (e.g., video data playing functions), such as smart phones, tablet computers, desktop computers, and the like. For example, the user terminal 3000a shown in fig. 1 may be used as the target user terminal in the embodiment of the present application, and the target user terminal may integrate a target application having the video data processing function. It should be understood that the target application integrated in the target user terminal may be collectively referred to as an application client. The application client may include, among other things, social applications, multimedia applications (e.g., video playback applications), entertainment applications (e.g., gaming applications), and the like, having a sequence of frames (e.g., a sequence of frame animations) loading and playing functionality. The frame animation sequence loaded and played in the target user terminal may comprise a first video sequence and a second video sequence. The second video sequence may be a video sequence obtained by performing frame reduction processing on the first video sequence by the target user terminal through a clustering key frame algorithm.

It can be understood that the solution for implementing frame reduction by clustering key frame algorithm described in the embodiment of the present application can be applied to all application scenarios of animation data implemented using frame sequences in web pages or application clients (i.e. the aforementioned target applications). When a target application with a video data processing function runs in the target user terminal, the first video sequence acquired by the target user terminal may contain animation data embedded in the target application in advance, and may also contain animation data currently downloaded from the server 2000 via the network.

It should be understood that the animation data built in advance in the target application and the currently downloaded animation data may be collectively referred to as video data (i.e., the first video sequence) that needs to be subjected to the frame reduction processing. Therefore, in the embodiment of the application, the frame reduction processing can be performed on the first video sequence during the running period of the webpage or the target application to obtain the second video sequence, so that the display effect of the video data can be improved and the occupation of the video data on the system memory can be reduced when the second video sequence is played in the webpage or the application client.

Optionally, before the target user terminal runs the target application, the embodiment of the present application may further perform frame reduction processing on the first video sequence acquired from the server 2000 shown in fig. 1 in advance in the target user terminal to obtain the aforementioned second video sequence, so that when the target user terminal runs the target application, the second video sequence may be directly loaded to reduce system performance loss during running of the target application (for example, occupation of system memory by video data may be reduced in the target user terminal). Optionally, in this embodiment of the present application, before the target user terminal runs the target application, the first video sequence may also be subjected to frame reduction in advance in the server 2000 to obtain a second video sequence, so that when the target user terminal runs the target application, a data download instruction (i.e., a data loading instruction) may be sent to the server 2000 through a network, so that the server may determine whether the target user terminal meets a frame reduction condition based on the terminal identifier carried in the download instruction. If the server 2000 determines that the target ue meets the frame reduction condition, that is, the server 2000 determines that the type of the terminal identifier of the target ue belongs to the identifier type of the low-end device, the server 2000 may return the second video sequence stored after the frame reduction process in advance to the target ue for data playing, so that when the target application runs in the target ue, the system performance loss may be reduced, and the loading efficiency of the video data may be improved. It can be seen that, in the embodiment of the present application, before the target application is run, the first video sequence may be subjected to frame reduction processing in the target user terminal or the server 2000, so as to obtain the second video sequence.

The animation data described in the embodiment of the present application may include a dynamic avatar, a dynamic wallpaper, a moving game character, a dynamic pendant in a camera, and a dynamic object in an application client. In other words, one or more objects having a motion state may be included in the animation data. For example, taking the target application as a social application as an example, the target user terminal may load and play animation data composed of the sequences of used frames through the social application, and may collectively refer to the played animation data as video data in the social application, where each sequence of frames in the video data may be referred to as a video frame or an image frame.

For easy understanding, please refer to fig. 2, which is a schematic view of a scene for acquiring a key video frame according to an embodiment of the present application. The first video sequence shown in fig. 2 may be a video sequence including a plurality of video frames, and specifically may include n video frames shown in fig. 2, where n may be a positive integer greater than 1, and the n video frames may include: video frame 10a, video frame 10b, video frame 10c, video frame 10d, video frame 10e, …, video frame 10 n. It can be understood that, on the premise that the read-write performance of the target user terminal is relatively general (i.e., the loading performance of the target user terminal is relatively general), in order to avoid a phenomenon of poor display effect caused by directly loading the first video sequence by the target user terminal, the first video sequence may be subjected to frame reduction processing in a web page or an application client side in the target user terminal through a clustering key frame algorithm to obtain k key frames associated with the first video sequence, where k may be a positive integer greater than 1 and less than n.

It should be understood that the similarity between the video frames in the first video sequence can be calculated by the clustering key frame algorithm, so that the clustering cluster to which the video frames in the first video sequence belong can be divided based on the calculated similarity between the video frames. For example, k cluster clusters shown in fig. 2 can be obtained, so that the video frame with the largest information entropy can be extracted from the k cluster clusters as the key video frame. The k clusters may be collectively referred to as a cluster, and the k clusters may specifically include cluster 20a, cluster 20b, cluster 20c, cluster 20d, …, and cluster 20k shown in fig. 2. It is understood that each of the k clusters shown in fig. 2 may include at least one video frame. As shown in fig. 2, the cluster 20a may include a video frame 10a, a video frame 10b, and a video frame 10 c; the cluster 20b may include

video frames

10d and 10 e; the cluster 20c may contain video frames 10 f; the cluster 20d may include video frames 10g, …, and the cluster 20k may include video frames 10(n-1), 10 n. Therefore, a plurality of cluster clusters associated with the first video sequence can be obtained by clustering the video frames in the first video sequence shown in fig. 2, so that the video frames with the maximum information entropy in each cluster can be called as key video frames, so that the frame reduction processing can be performed on the first video sequence based on the key video frames, and further the video sequence formed by the key video frames can be called as a second video sequence after the frame reduction processing, as shown in fig. 2, the video frames in the second video sequence can include k key video frames obtained by performing the frame reduction processing on the first video frames, and each key video frame in the k key video frames is a video frame extracted from the corresponding cluster. For example, as shown in fig. 2, the video frame 10a with the largest information entropy in the cluster 20a may be referred to as a key video frame corresponding to the cluster 20a, and similarly, the video frame 10e with the largest information entropy in the cluster 20b may be referred to as a key video frame corresponding to the cluster 20 b. Similarly, the video frame 10f with the largest information entropy in the cluster 20c can be referred to as a key video frame corresponding to the cluster 20 c. Similarly, the video frame 10g with the largest information entropy in the cluster 20d can be referred to as a key video frame corresponding to the cluster 20 d. Similarly, the video frame 10n with the largest information entropy in the cluster 20k can be referred to as a key video frame corresponding to the cluster 20 k. Further, when the target user terminal may load the second video sequence in the web page or the target application, the playing logic of the video data needs to be adjusted, for example, the second video sequence may be played according to the playing time point (i.e., the playing time stamp) of each key video frame in the k key video frames in the first video sequence, so that the playing time length of each key video frame in the second video sequence may be effectively controlled.

When the first video sequence is clustered by using a clustering key frame algorithm, k classifications associated with the first video sequence can be obtained, where k can be a positive integer greater than 1, one classification can be referred to as a cluster, and one cluster can contain at least one video frame. Wherein, it can be understood that, in the k classifications, for any one classification containing a plurality of video frames, any two video frames in the same classification (i.e. clustering cluster) are similar, in some application scenes (e.g. scenes of a certain game role in a game application running on a low-end computer) that need to reduce the system memory, in order to prevent different video frames with similar characteristics from occupying the storage resources (i.e. memory) of the system when playing video data, the system memory occupied by the game role can be optimized, i.e. a first video sequence containing the game role can be clustered, so that a video frame with the maximum information entropy can be respectively extracted from each clustering cluster obtained by clustering as a key video frame, and then frame reduction processing of the first video sequence can be realized based on the key video frames, to reduce the performance loss of moving objects (e.g., a moving game character) in a web page or target application. It should be understood that any two of the n categories are dissimilar, and by extracting the key video frames in each category, it can be ensured that each key video frame in the second video sequence obtained after the frame subtraction process has a strong representativeness, so that when the second video data is played in a webpage or a target application, the display effect of the game character on the video data can be ensured.

In other words, by performing frame reduction processing on the first video frame, the system memory occupied by the video data can be effectively reduced when the second video sequence is loaded in the webpage or the target application. In addition, by recording the playing time stamp of each key video frame in the first video sequence, the time interval between any two adjacent video frames in the second video sequence can be quickly determined, since each video frame (i.e. key video frame) in the second video sequence is a video frame obtained by clustering the first video sequence, the time interval between two adjacent video frames in the second video sequence may not be exactly the same, for example, the time interval between two adjacent video frames in the second video frame sequence may be T1, T2, wherein, the time interval T1 may be the same as the time interval T2, or different from the time interval T2, therefore, the optimization of the system memory occupied by the video data can be realized when the second video sequence is played, so that the display effect of the video data is improved.

The specific implementation manner of the target user terminal acquiring the cluster associated with the first video sequence, acquiring the key video frame from the cluster, and playing the second video sequence may be as shown in the following embodiments corresponding to fig. 3 to fig. 7.

Further, please refer to fig. 3, which is a flowchart illustrating a video data processing method according to an embodiment of the present application. As shown in fig. 3, the method at least comprises:

step S101, clustering video frames in a first video sequence to obtain a cluster associated with the first video sequence, and acquiring key video frames in the cluster;

specifically, when a first video sequence is acquired, the video data processing apparatus may convert an initial color space associated with video frames in the first video sequence into a target color space, so that the video frames in the first video sequence may be further clustered in the target color space to obtain a cluster associated with the first video sequence; further, the video data processing apparatus may regard, as the key video frame, a video frame that matches the key frame acquisition condition in the cluster. In other words, the video data processing apparatus may divide the first video sequence into a plurality of cluster clusters through a clustering key frame algorithm, so that a video frame meeting a key frame acquisition condition may be screened out from each cluster as a key video frame. The key frame obtaining condition is the video frame with the maximum information entropy screened from the information entropy corresponding to each video frame of a cluster by the video data processing device.

It is understood that the embodiment of the present application may integrate a video data processing apparatus having an image data processing function in a target user terminal, so that the target user terminal has the image data processing function. For example, in some web pages or application scenes of animation data implemented by using a frame sequence in an application client, in order to ensure the display effect of the animation data in the application scenes, a frame reduction process may be performed on video frames in a first video sequence in the application scenes by using a clustering key frame algorithm, so as to reduce a system memory occupied by the video data in the application scenes.

The application scene may include a dynamic head portrait, a dynamic wallpaper, and the like in a web page, and the application scene may further include a dynamic object in an application client, for example, a moving game character, and the like, which are not listed here. In addition, in these application scenarios, the video data to which an object having a motion state (e.g., a moving object such as a person or an object) belongs may be collectively referred to as a first video sequence.

Wherein, there may be a video frame containing the moving object (e.g., object a) in the first video sequence, and there may also be a video frame not containing the moving object. For example, for each video frame in a first video sequence containing the moving object (i.e., object a), a similarity between each video frame in the first video sequence may be calculated, so that the first video sequence may be classified based on the similarity between the video frames to obtain a plurality of cluster clusters associated with the first video sequence. For example, for each video frame containing the object a, the similarity between each of the video frames containing the object a may be calculated, so that the video frames with higher similarity in the first video sequence may be classified in the same cluster, i.e., higher similarity may exist between each of the video frames in the same cluster, for example, when it is determined that higher similarity exists between the x1 th video frame in the first video sequence and the x2 th video frame and the x3 th video frame, the 3 video frames may be classified in the same cluster. In the target user terminal, in order to avoid the phenomena of performance tension and the like caused by loading a plurality of video frames with higher similarity together, frame reduction processing can be performed on the first video sequence when the first video sequence is acquired in a webpage or an application client, so as to reduce a system memory occupied by the video frames with higher similarity. In other words, the target user terminal may perform frame reduction on the first video sequence acquired from the server or the first video sequence embedded in the application client (i.e., the target application) in advance, for example, the aforementioned 3 video frames with higher similarity may be subjected to frame reduction, so as to optimize the system memory occupied by the moving object in the 3 video frames in the web page or the application client.

In the process of calculating the similarity between two video frames in the first video sequence, in order to better fit the sensitivity of human eyes to colors, the color space to which the video frames in the first video sequence belong may be first converted from an initial color space (e.g., an RGB color space) to a target color space (e.g., an HSV color space), so that color histograms between two video frames participating in calculation may be sequentially obtained in the target color space in a polling manner (i.e., a frame-by-frame clustering manner), and the color similarity between the two color histograms may be collectively referred to as the similarity between the two video frames.

It can be understood that, for a color object represented by any one pixel point in each video frame in the first video sequence, color components in different color spaces can be used for representation. In other words, the color object represented by the same pixel point can be measured from different angles through different color spaces, for example, for the color object represented by the pixel point a, the color object can be comprehensively represented by an R (red) component, a G (green) component, and a B (blue) component in an RGB color space, or can be comprehensively represented by an H (hue) component, an S (saturation) component, and a V (brightness) component in an HSV color space. It should be understood that the embodiments of the present application can map points in the RGB color space (i.e., the three-dimensional cube) into the HSV color space, so that equivalent representation can be performed by points in an inverted cone in the HSV color space. In other words, the different color spaces can be converted following the corresponding color conversion relationship, for example, the RGB colors distributed in the range of 0 to 255 can be directly mapped into HSV colors of 0 to 255 through the color conversion relationship. Further, in order to reduce the amount of calculation, the HSV color may be further divided once again in the embodiments of the present application, that is, the H component, the S component, and the V component for describing the HSV color may be further divided, so that the H component may be equally divided into 12 blocks, the S component may be equally divided into 5 blocks, and the V component may be equally divided into 5 blocks, so as to reduce the calculation dimension. That is, in the embodiment of the present application, by performing spatial transformation, RGB color transformation in the original range of 255 × 255 may be mapped onto the statistical range of 12 × 5, so as to respectively count the ratio of the values of each component in the HSV color components in the statistical range.

In the RGB color space, a color object (e.g., F) represented by any pixel point can be represented by R, G, B in a mixed manner by corresponding numerical values of three color components (i.e., three primary color components), for example, F ═ R (R, G, B). In other words, the RGB color space may be described as a three-dimensional cube model. The color object obtained by mixing when the values of the three primary color components are all 0 (weakest) is black; the mixed color object is white when the numerical values of the three primary color classifications are all at the maximum value (e.g., 255).

Among them, in the HSV color space, the HSV color space may be described by a conical space model. Wherein, the H component (Hue) is used to describe the basic attribute of color, that is, the name of the color in general, such as red, yellow, etc.; the S component (Saturation) is used for describing the purity of the color, the color is purer when the Saturation is higher, the color gradually becomes grey when the Saturation is lower, and the value range of the Saturation is a numerical value of 0-100%; the V component (Value) is determined by the maximum Value of the three components in the RGB color space, and the maximum Value of the V component determines the height of the cone. At the apex of the cone, V ═ 0, H and S are undefined, and these three components can be used to describe black. At the center of the top surface of the cone, V ═ max (i.e., the maximum value), S ═ 0, and H is undefined, and this component may be used to describe white.

The RGB color can be represented by mixing numerical values of an R component, a G component and a B component in an RGB color space, and the value ranges of the R component, the G component and the B component are all 0-255. The HSV color can be represented by mixing an H component, an S component and a V component in an HSV color space, the value range of the H component is 0-360 degrees, the value range of the S component is 0-100%, and the value range of the V component is the maximum value in the pixel objects described by the R component, the G component and the H component. It can be understood that, in the process of performing spatial mapping, a value of a component having a maximum value among three components (i.e., an R component, a G component, and an H component) in the RGB color space may be determined as a value of a V component in the converted HSV color space. For example, in the RGB color space, in the color component for describing the pixel point a, if it is determined that the value of the R component is 210, the value of the G component is 155, and the value of the B component is 120, the value 210 of the R component may be determined as the value of the V component in the converted HSV color space.

In consideration of the fact that each class (i.e., cluster) maintains a cluster centroid (which may be referred to as a centroid for short) in the process of dividing the cluster to which the video frames in the first video sequence belong by using the clustering key frame algorithm, in the embodiment of the present application, one video frame of the two video frames participating in the calculation may be referred to as a first video frame, and the other video frame may be referred to as a second video frame. The first video frame can be understood as a video frame capable of being used as a clustering center in the first video sequence, after the first video frame is selected, second video frames can be sequentially acquired in the first video sequence through a polling mechanism, so that color similarity between color histograms corresponding to the two video frames can be calculated, and the similarity between the two video frames can be described by using the calculated color similarity between the two color histograms.

It can be seen that, before performing similarity classification on the video frames in the first video sequence through the clustering key frame algorithm, the embodiment of the present application may first convert the color space to which the video frames in the first video sequence belong from an initial color space (e.g., RGB color space) to a target color space (e.g., HSV color space), so as to calculate a similarity (i.e., color similarity) between each video frame in the first video sequence in the target color space (i.e., HSV color space), so as to classify the cluster of the first video sequence according to the color similarity between the video frames.

For easy understanding, please refer to fig. 4a and fig. 4b, which are schematic diagrams illustrating a method for partitioning clusters associated with a first video sequence according to an embodiment of the present application. The first video sequence shown in fig. 4a may comprise a plurality of video frames, which may be the video frame 10a, the video frame 20a, the video frame 30a, the video frame 40a, the video frame 50a shown in fig. 4 a. Wherein, the video frame 10a shown in fig. 4a is the video frame 10a shown in fig. 4 b; the video frame 20a shown in FIG. 4a is the video frame 20a shown in FIG. 4 b; by analogy, the video frame 30a shown in fig. 4a is the video frame 30a shown in fig. 4 b; the video frame 40a shown in fig. 4a is the video frame 40a shown in fig. 4 b. It can be understood that the color space to which the video frames in the first video sequence shown in fig. 4a belong is the above target color space. That is, in the target color space (i.e. the aforementioned HSV color space), the target user terminal integrated with the data processing apparatus may perform a clustering process on the video frames in the first video sequence shown in fig. 4a to obtain a cluster associated with the first video sequence. Specifically, as shown in fig. 4a, the target user terminal may refer to, in the first video sequence shown in fig. 4a, a first video frame in the first video sequence as a first video frame that can be used as a cluster centroid, may determine, in the first video sequence, video frames other than the first video frame as second video frames, and may sequentially acquire the second video frames based on a polling mechanism. For example, as shown in fig. 4a, when determining the first video frame, the target user terminal may further collectively refer to the

video frames

20a, 30a, 40a, and 50a as the second video frame in turn based on the frame number of each video frame in the first video sequence, so as to sequentially calculate the color similarity between the first video frame and the second video frame.

As shown in fig. 4a, when a video frame 10a is taken as a first video frame (i.e., as shown in fig. 4b, the embodiment of the present application takes the video frame 10a as a cluster centroid 1), a cluster to which the video frame 10a belongs (i.e., as shown in fig. 4b, cluster 1) may be created first. In other words, the target user terminal may start the first round of similarity calculation when the video frame 10a is used as the clustering centroid 1, that is, may calculate the color similarity between the video frame 10a in fig. 4a (i.e., the first video frame) and the video frame 20a in fig. 4a (i.e., the second video frame) (i.e., the similarity 1 shown in fig. 4b may be obtained in the first round of similarity calculation), so as to determine whether the color similarity between the two video frames (i.e., the similarity 1) is smaller than the clustering threshold shown in fig. 4b, and may divide the video frame 20a with the similarity 1 shown in fig. 4b larger than the clustering threshold into the cluster (i.e., the cluster 1 shown in fig. 4 b) to which the first video frame (i.e., the video frame 10a) belongs when the similarity 1 shown in fig. 4b is larger than (or equal to) the clustering threshold, in other words, the present embodiment may divide the second video frame with the color similarity larger than or equal to the clustering threshold into the clustering threshold And dividing the video frames into cluster clusters to which the first video frames belong.

Further, as shown in fig. 4a, since the video frame 30 is the next video frame of the video frame 20, the target user terminal can still continue the second round of similarity calculation when the video frame 10a is used as the clustering centroid 1, that is, as shown in fig. 4b, the target user terminal can continue to calculate the color similarity between the video frame 10a in fig. 4a (i.e., the first video frame) and the video frame 30a in fig. 4a (i.e., the new second video frame) (i.e., the similarity 2 shown in fig. 4b can be obtained in the process of the second round of similarity calculation), so that it can be determined whether the color similarity (i.e., the similarity 2) between the two video frames is smaller than the clustering threshold shown in fig. 4b, and when the similarity 2 shown in fig. 4b is smaller than the clustering threshold, the first video frame can be updated according to the video frame 30a shown in fig. 4b, that is, in the embodiment of the present application, the second video frame (i.e., the video frame 30a in the first video sequence shown in fig. 4 a) whose color similarity is smaller than the clustering threshold may be used as a new clustering center, and the new clustering center may be the clustering center 2 shown in fig. 4b, that is, as shown in fig. 4b, in the embodiment of the present application, a new clustering cluster may be created for the video frame 30a, and the new clustering cluster may be the clustering cluster 2 shown in fig. 4 b. At this time, since the color similarity between the video frame 10a and the video frame 30a is smaller than the clustering threshold, the embodiment of the present application may not continue to perform color similarity matching on the video frame 10a and the above-mentioned unmatched second video frame (i.e. the video frame 40a and the video frame 50a shown in fig. 4 a).

It should be understood that, in the embodiment of the present application, when a new cluster centroid (i.e., cluster centroid 2) is determined, a new first video frame (i.e., video frame 30a shown in fig. 4 a) may be obtained, at this time, the cluster to which the video frame 30a belongs may be the cluster 2 shown in fig. 4b, and then, the target user terminal may still obtain second video frames in sequence based on the polling mechanism, that is, may continue to obtain second video frames from the unmatched second video frames (i.e., video frame 40a and video frame 50a shown in fig. 4 a). Further, the target user terminal may restart the first round of similarity calculation when the video frame 30a is used as the clustering centroid 2, i.e., may calculate the color similarity between the video frame 30a in fig. 4a (i.e., the new first video frame) and the video frame 40a in fig. 4a (i.e., the second video frame) (i.e., the similarity 3 shown in fig. 4b may be obtained in the process of the new first round of similarity calculation), so as to determine whether the color similarity between the two video frames (i.e., the similarity 3) is smaller than the clustering threshold shown in fig. 4b, and may update the new first video frame according to the video frame 40a shown in fig. 4b when the similarity 3 shown in fig. 4b is smaller than the clustering threshold, i.e., the present embodiment may use the second video frame (i.e., the video frame 40a in the first video sequence shown in fig. 4 a) whose color similarity is smaller than the clustering threshold as another clustering centroid, the another new cluster centroid may be the cluster centroid 3 shown in fig. 4b, that is, as shown in fig. 4b, the embodiment of the present application may create a new cluster for the video frame 40a, and the new cluster may be the cluster 3 shown in fig. 4 b. At this time, since the color similarity between the video frame 30a and the video frame 40a is smaller than the clustering threshold, the embodiment of the present application may not continue to perform color similarity matching on the video frame 30a and a new unmatched second video frame (i.e., the video frame 50a shown in fig. 4 a).

It should be understood that, when another new cluster centroid (i.e., the cluster centroid 3) is determined, the another new first video frame (i.e., the video frame 40a shown in fig. 4 a) may be obtained in the embodiment of the present application, at this time, the cluster to which the video frame 40a belongs may be the cluster 3 shown in fig. 4b, and then, the target user terminal may still obtain the second video frame based on the polling mechanism, that is, may continue to obtain the second video frame from the unmatched second video frame (i.e., the video frame 50a shown in fig. 4 a). Further, the target user terminal may restart the first round of similarity calculation when the video frame 40a serves as the cluster centroid 3, that is, may calculate the color similarity between the video frame 40a in fig. 4a (i.e., another new first video frame) and the video frame 50a in fig. 4a (i.e., the second video frame), so that the cluster to which the video frame 50a belongs may be divided based on the color similarity (e.g., similarity 4) between the two video frames participating in the calculation. For example, if the similarity 4 is greater than or equal to the clustering threshold shown in fig. 4b, the video frame 50a (i.e., the second video frame having a color similarity greater than or equal to the clustering threshold) may be divided into the clustering clusters (i.e., clustering cluster 3) to which the video frame 40a belongs, and at this time, the clustering clusters associated with the first video sequence determined by the color similarity between the video frames may include clustering cluster 1, clustering cluster 2, and clustering cluster 3 shown in fig. 4 b.

Alternatively, if the similarity 4 is less than the clustering threshold, the video frame 50a (i.e., the second video frame with the color similarity greater than or equal to the clustering threshold) may be classified into a new cluster, i.e., the target user terminal may create a new cluster (e.g., cluster 4) for the video frame 50 a. At this time, the cluster determined by the color similarity between the video frames and associated with the first video frame may include cluster 1, cluster 2, cluster 3 shown in fig. 4b, and may further include the cluster 4.

Therefore, the embodiment of the application can perform color similarity matching on the first video frame and the second video frame in the target color space when the cluster to which the first video frame belongs is created; if the color similarity between the first video frame and the second video frame is greater than or equal to the clustering threshold value, dividing the second video frame with the color similarity greater than or equal to the clustering threshold value into the clustering cluster to which the first video frame belongs; optionally, if the color similarity between the first video frame and the second video frame is smaller than the clustering threshold, the first video frame is updated based on the second video frame whose color similarity is smaller than the clustering threshold, a clustering cluster to which the updated first video frame belongs is created, and the updated first video frame is sequentially subjected to color similarity matching with the unmatched second video frame, until the video frames in the first video sequence are all subjected to color similarity matching, that is, until each second video frame in the first video sequence is subjected to color similarity matching, the clustering cluster to which the video frame in the first video sequence belongs may be output.

The specific process of performing color similarity matching on the first video frame and the second video frame in the target color space in the embodiment of the present application may also be described as follows: the target user terminal may determine a color histogram of the first video frame in the target color space as a first histogram and determine a color histogram of the second video frame in the target color space as a second histogram; the target color space contains a plurality of color components; further, the target user terminal may determine a similarity between the first histogram and the second histogram based on the statistical probability value associated with each color component in the first histogram and the statistical probability value associated with each color component in the second histogram; further, the target user terminal may determine a similarity between the first histogram and the second histogram as a color similarity between the first video frame and the second video frame.

Among other things, it should be understood that after a plurality of cluster clusters associated with the first video sequence are determined, a representative video frame can be obtained from each cluster as a key video frame. That is, in the embodiment of the present application, video frames meeting the key frame obtaining condition may be screened from each cluster, and the video frames meeting the key frame obtaining condition may be collectively referred to as key video frames, so as to further execute step S102.

Step S102, determining a second video sequence based on the key video frames.

For ease of understanding, please refer to fig. 5, which is a schematic diagram of determining a second video sequence according to an embodiment of the present application. Wherein, the first video sequence shown in fig. 5 is the first video sequence shown in fig. 4 a. As shown in fig. 5, when the video frame 10a in the first video sequence of fig. 4a is taken as the cluster centroid (i.e., cluster centroid 1), the video frames in the cluster 1 shown in fig. 5 may include the video frame 10a and the video frame 20a shown in fig. 5; in addition, as shown in fig. 5, when the video frame 30a in the first video sequence of fig. 4a is taken as a new cluster centroid (i.e., cluster centroid 2), the video frames in the cluster 2 shown in fig. 5 may contain the video frame 30a shown in fig. 5; in addition, when the video frame 40a in the first video sequence of fig. 4a is taken as another new cluster centroid (i.e., cluster centroid 3), the video frames in the cluster 3 shown in fig. 5 may include the video frame 40a and the video frame 50a shown in fig. 5. As shown in fig. 5, in the embodiment of the present application, the cluster 1, the cluster 2, and the cluster 3 shown in fig. 5 may be collectively referred to as a cluster, and then, the target user terminal may determine, in a target color space, an information entropy corresponding to a video frame in the cluster based on an accumulated probability value corresponding to each color component carried by the video frame in the cluster; further, the target user terminal may search for the video frame with the maximum information entropy from the information entropy corresponding to the video frames in the cluster, as shown in fig. 5, the target user terminal may refer to the video frame 10a with the maximum information entropy from the cluster 1 shown in fig. 5 as the key video frame 1 meeting the key frame acquisition condition; similarly, as shown in fig. 5, the target user terminal may refer to the video frame 30a having the largest information entropy from the cluster 2 shown in fig. 5 as the key video frame 2 satisfying the key frame acquisition condition, and similarly, as shown in fig. 5, the target user terminal may refer to the video frame 30a having the largest information entropy from the cluster 3 shown in fig. 5 as the key video frame 3 satisfying the key frame acquisition condition. In other words, the video frames with the largest information entropy that are found can be collectively referred to as key video frames obtained from the cluster. Further, the target user terminal may construct a new video sequence based on the 3 key video frames shown in fig. 5, and may refer to the new video sequence as the second video sequence shown in fig. 5, so that the frame reduction processing of the first video sequence may be implemented. It is understood that the video frames in the second video sequence may include the key video frame 1 shown in fig. 5 (i.e., the video frame 10a having the largest information entropy in the cluster 1), the key video frame 2 shown in fig. 5 (i.e., the video frame 30a having the largest information entropy in the cluster 2), and the key video frame 3 shown in fig. 5 (i.e., the video frame 40a having the largest information entropy in the cluster 3).

In the target color space, based on the respective cumulative probability values corresponding to each color component carried by the video frames in the cluster, a specific process of determining the information entropy corresponding to the video frames in the cluster may be described as follows: obtaining the index parameter of each color component in the target color space, for example, for three color components in the HSV color space, since the H component is equally divided into 12 blocks, in the HSV color space, one block may be determined every 30 degrees apart, for example, every time the H component falls within a value interval of [0 ° and [30 °), the index parameter i-1 may be added, for example, every time the H component falls within a value interval of [30 ° and [ 59 °), the index parameter i-2 may be added, and so on, every time the H component falls within a value interval of [330 ° and [ 360 °), the index parameter i-12 may be added, and in view of this, in the HSV color space, the value range of the index parameter i of the H component may be 1 to 12, so that for each video frame of a cluster, the statistical probability value of each video frame at the index parameter i of the H component can be counted, so that the probability statistics values at the 12 index parameters can be accumulated. In addition, since the S component is equally divided into 5 blocks, in the HSV color space, one block can be determined every 20% saturation, for example, every time the S component falls within a value range of [0, 20%, the index parameter j ═ 1 may be subjected to an addition process, and for example, each time the S component falls within a value range of [ 20%, 40%), the index parameter j ═ 2 may be subjected to an addition process, by analogy, every time the S component falls within the value range of [ 80%, 100%), the index parameter j equal to 12 may be added by one, and in view of this, in the HSV color space, the index parameter j of the S component may have a value ranging from 1 to 5, so that for each video frame of a cluster, the statistical probability value of each video frame at the index j of the S component can be counted, so that the probability statistics values at the 5 index j can be accumulated. In this way, since the V component is equally divided into 5 blocks, in the HSV color space, one block may be determined for each saturation of the interval 51, for example, each time the V component falls within the value interval of [0, 51), the index parameter k equal to 1 may be subjected to an addition process, and so on, each time the S component falls within the value interval of [205, 256), the index parameter k equal to 5 may be subjected to an addition process, and in view of this, in the HSV color space, the value range of the index parameter k of the V component may be 1 to 5, so that for each video frame of the cluster, the statistical probability value of each video frame on the index parameter k of the S component may be counted, and the probability statistical values on the 5 index parameters may be accumulated.

For convenience of understanding, in the embodiment of the present application, the video frame 10a in the cluster 1 is taken as an example to illustrate a specific calculation process for determining the cumulative probability value corresponding to each color component in the HSV color space. It can be understood that, since each pixel in the video frame 10a can be represented by mixing the H component, the S component, and the V component in the target color space, in the HSV color space, the statistical probability value of each index parameter of the H component can be calculated by the following formula (1):

in the formula (1), H (i) may be a statistical probability value of an H component in the target color space with an index parameter i, where a value range of the index parameter i of the H component may be 1 to 12; that is, H _ f (i) in the formula (1) can be used to describe the number of pixels of the target user terminal when the value of the index parameter of the H component is i. Where M × N is a size (e.g., 80 × 80) of the video frame 10a, in other words, based on the size of the video frame 10a, M × N pixels can be determined in the video frame 10 a; for example, when the number of pixels in the video frame 10a is 20 when the value i of the index parameter of the H component is counted as 1, H (1) may be obtained as 20/1600, which is 1.25%, or when the number of pixels in the video frame 10a when the value i of the index parameter of the H component is counted as 2 is 200, H (2) may be obtained as 200/1600 which is 12.5%, and values of i are not listed here.

In the HSV color space, the statistical probability value at each index parameter of the S component may be calculated by the following formula (2):

in the formula (2), S (i) may be a statistical probability value of an index parameter j of an S component in the target color space, where a value range of the index parameter j of the S component may be 1 to 5; that is, S _ f (i) in the formula (2) may be used to describe the number of pixels of the target user terminal when the value of the index parameter of the S component is i. Where M × N is a size (e.g., 80 × 80) of the video frame 10a, in other words, based on the size of the video frame 10a, M × N pixels can be determined in the video frame 10 a; for example, if the number of pixels in the video frame 10a counted that the index parameter j of the S component is 1 is 40, S (1) 40/1600 may be 2.5%, or if the number of pixels in the video frame 10a counted that the index parameter j of the S component is 2 is 400, S (2) 400/1600 may be 25%, and values of j may not be listed.

In the HSV color space, the statistical probability value at each index parameter of the V component may be calculated by the following formula (3):

in the formula (3), V (k) may be a statistical probability value of a V component in the target color space with an index parameter k, where a value range of the index parameter k of the V component may be 1 to 5; that is, V _ f (i) in the formula (3) may be used to describe the number of pixels of the target user terminal when the value of the index parameter of the V component is i. Where M × N is a size (e.g., 80 × 80) of the video frame 10a, in other words, based on the size of the video frame 10a, M × N pixels can be determined in the video frame 10 a; for example, when the number of pixels counted in the video frame 10a when the index parameter k of the V component is 1 is 40, V (1) 40/1600 may be 2.5%, or when the number of pixels counted in the target user terminal when the index parameter j of the V component is 2 is 40, S (2) 40/1600 may be 2.5%, and values of j that are not identical are listed here.

It can be understood that after obtaining the statistical probability value of each color component in the video frame 10a on the corresponding index parameter, the cumulative probability value corresponding to each color component may be obtained, wherein, in consideration of the sensitivity of human eyes to colors, the embodiment of the present application may set the weight W1 of the H component to 0.5, the weight W2 of the S component to 0.3, and the weight W3 of the V component to 0.2, and then the information entropy of the video frame 10a may be calculated by the following calculation formula of the information entropy.

It can be understood that, in the formula for calculating the information entropy, W1 is the weight of the H component, and H (i) the statistical probability value of each index parameter i of the H component can be calculated by the above formula (1), so that the cumulative probability values of the 12 index parameters obtained by accumulation can be referred to as the cumulative probability values corresponding to the H component; similarly, in the formula for calculating the information entropy, W2 is the weight of the S component, and S (j) can obtain the statistical probability value of each index parameter j of the S component through the formula (2), so that the cumulative probability values of the 5 index parameters obtained through accumulation can be referred to as the cumulative probability value corresponding to the S component; similarly, in the formula for calculating the information entropy, W3 is the weight of the V component, and V (k) can obtain the statistical probability value of each index parameter k of the V component through the formula (3), so that the cumulative probability values of the 5 index parameters obtained through accumulation can be referred to as the cumulative probability values corresponding to the V component. Further, when obtaining the respective cumulative probability values corresponding to each color component, the target user terminal may multiply the respective cumulative probability values corresponding to each color component by the weight values corresponding to the corresponding color components, and then sum up the multiplied values, so as to obtain the information entropy of the video frame 10a in the cluster 1. It can be understood that, for the calculation process of the information entropy of other video frames in the cluster 1, reference may be made to the calculation process of the information entropy of the video frame 10a in the cluster 1, and details will not be further described here. Similarly, the information entropy calculation process of the video frames in other cluster can also refer to the information entropy calculation process of the video frame 10a in the cluster 1, and will not be described again here.

Step S103, determining a time interval between two adjacent key video frames in the second video sequence according to the playing time stamps of the key video frames in the first video sequence.

The two adjacent key video frames in the second video sequence may include a first key video frame and a second key video frame, and the playing time stamps of the two adjacent key video frames in the first video sequence are recorded to obtain the time interval between the two key video frames, so that the first key video frame may be played in the time interval to further perform step S104.

And step S104, playing the second video sequence based on the time interval between the two adjacent key video frames.

Specifically, the target user terminal may control a play duration (i.e., a display duration) of the first key video frame based on a time interval between the first key video frame and the second key video frame; therefore, the first key video frame can be played within the playing time (namely, the display time) of the first key video frame, and the second key video frame is played until the playing progress of the second video sequence reaches the playing time stamp of the second key video frame.

For easy understanding, please refer to fig. 6, which is a schematic view of a scene for playing the second video sequence according to an embodiment of the present application. As shown in fig. 6, when determining the time interval between two adjacent key video frames (i.e., the time interval between the first key video frame and the second key video frame), the target user terminal may control the playing duration of the first key video frame based on the time interval between the two key video frames. As shown in fig. 5, the key video frame 1 in the second video sequence shown in fig. 6 may be the video frame 10a in the first video sequence shown in fig. 6, the key video frame 2 in the second video sequence shown in fig. 6 may be the video frame 30a in the first video sequence shown in fig. 6, and the key video frame 3 in the second video sequence shown in fig. 6 may be the video frame 40a in the first video sequence shown in fig. 6. As shown in fig. 6, the playing time stamp of the key video frame 1 in the first video sequence may be the time stamp T1, that is, in the first video sequence, when the playing progress reaches the time stamp T1, the video frame 10a in the first video sequence may be played; similarly, as shown in fig. 6, the playing time stamp of the key video frame 2 in the first video sequence may be the time stamp T3, that is, in the first video sequence, when the playing progress reaches the time stamp T3, the video frame 30a in the first video sequence may be played; similarly, as shown in fig. 6, the playing time stamp of the key video frame 3 in the first video sequence may be the time stamp T4, that is, in the first video sequence, when the playing progress reaches the time stamp T4, the video frame 40a in the first video sequence may be played.

And adjusting the playing logic of each key video frame in the second video sequence to ensure the playing effect of each key video frame after the frame reduction processing. For example, the target user terminal may synchronously mark the time points of the key video frames in each cluster in the first video sequence (i.e. the aforementioned time stamp T1, the aforementioned time stamp T3, and the aforementioned time stamp T4) when performing frame reduction processing on the video frames in the cluster, so that the second video sequence can be played according to the corresponding time points according to each key video frame. In other words, when the second video sequence is played, each key video frame may be played based on a corresponding time point of each key video frame.

Optionally, when determining the playing time stamp (i.e., the time point) of each key video frame, the embodiment of the present application may further determine a time interval between two adjacent key video frames (i.e., the first key video frame and the second key video frame), so that the display duration of the first key video frame in the two key video frames may be controlled by the time interval between the two adjacent key video frames, so as to further adjust the playing logic of each key video frame in the second video sequence. For example, for three key video frames in the second video sequence shown in fig. 6, the key video frame 1 and the key video frame 2 may be referred to as two adjacent key video frames, and similarly, the key video frame 2 and the key video frame 3 shown in fig. 6 may also be referred to as two adjacent key video frames in the embodiment of the present application. The time interval between the key video frame 1 and the key video frame 2 may be the time interval between the timestamp T1 corresponding to the video frame 10a and the timestamp T2 corresponding to the video frame 30a, and the display duration 1 shown in fig. 6 may be obtained by the time interval between two adjacent key video frames, so that the key video frame 1 may be played in the display duration 1 shown in fig. 6, so as to control the playing duration of the key video frame 1 (i.e., the first key video frame), and when the playing progress of the second video sequence reaches the playing time error of the key video frame 2 (i.e., the timestamp T3), the key video frame 2 (i.e., the second key video frame) may be played.

Similarly, the time interval between the key video frame 2 and the key video frame 3 can be the time interval between the timestamp T3 corresponding to the video frame 30a and the timestamp T4 corresponding to the video frame 40a, the display duration 2 shown in figure 6 can be obtained by the time interval between the two adjacent key video frames, so that the key video frame 1 can be played within the display duration 2 shown in fig. 6, to control the play duration of the key video frame 2 (i.e. the new first key video frame), and may be done when the progress of the second video sequence reaches the playback time error of key video frame 3 (i.e. the aforementioned time stamp T4), and playing the key video frame 3 (i.e. a new second key video frame) until the playing time of the second video sequence reaches the playing time of the first video sequence, and stopping playing the second video sequence.

In the embodiment of the application, the first video sequence is subjected to frame reduction processing in a key frame clustering mode, so that the number of video frames in the second video sequence obtained after frame reduction processing can be effectively ensured to be less than that of the video frames in the first video sequence, and the system memory occupied by video data can be reduced when the second video sequence is played in terminal equipment; in addition, a representative video frame can be extracted from each clustering cluster in a key frame clustering mode to serve as a key video frame, so that the visual transition effect between any two adjacent key video frames in the second video sequence can be ensured as much as possible in the process of performing frame reduction processing according to the key video frames; in addition, through the time interval between two adjacent key video frames, the playing time of each video frame can be effectively controlled, and the display effect of the video data can be further improved.

Further, please refer to fig. 7, which is a schematic diagram of another video data processing method according to an embodiment of the present application. As shown in fig. 7, the method may comprise the steps of:

step S201, acquiring a first video sequence, and converting an initial color space associated with a video frame in the first video sequence into a target color space;

for a specific implementation manner of the target user terminal converting the initial color space associated with the video frame in the first video sequence into the target color space, reference may be made to the description of the target color space in the embodiment corresponding to fig. 3, which will not be further described here.

Step S202, in the target color space, clustering the video frames in the first video sequence to obtain a cluster associated with the first video sequence;

specifically, the target user terminal may obtain a first video frame from the first video sequence for being a clustering centroid; further, the target user terminal may determine, in the first video sequence, video frames other than the first video frame as second video frames, and sequentially acquire the second video frames based on a polling mechanism; further, the target user terminal may divide the cluster to which the video frame in the first video sequence belongs according to the color similarity between the first video frame and the second video frame in the target color space.

It can be understood that, in the target color space (i.e. the aforementioned HSV color space), the target user terminal may perform frame-by-frame clustering on two video frames in the first video sequence, and during the frame-by-frame clustering, the two video frames are mainly classified according to the color similarity between the two video frames. Wherein the two video frames in the first video sequence may comprise a first video frame and a second video frame. In this embodiment, one of the two video frames (i.e., the first video frame) may be referred to as a cluster centroid, and the other of the two video frames may be referred to as a second video frame to be color-similarity-matched with the cluster centroid.

It is to be understood that, for the first video sequence shown in fig. 4a, a first video frame in the first video sequence (i.e., the video frame 10a in fig. 4 a) may be referred to as a clustering center 1, in which case, the embodiment of the present application may refer to the clustering center 1 as a first video frame of two video frames, and may refer to video frames other than the clustering center 1 as second video frames in the first video sequence, in which case, the second video frames are all second video frames to be color-similarity-matched with the clustering center 1. Further, according to the foregoing frame-by-frame clustering rule, second video frames to be color-similarity-matched with the first video frame are sequentially obtained in the first video sequence through a round-robin mechanism, and at this time, the second video frames to be color-similarity-matched with the cluster centroid 1 (for example, the

video frames

20a, 30a, 40a, 50a shown in fig. 4a above) may be collectively referred to as the second video frames to be color-similarity-matched. Therefore, the video frame 20a adjacent to the video frame 10a can be preferentially determined as the second video frame to be matched in the first video sequence shown in fig. 4a according to the aforementioned polling mechanism to calculate the color similarity between the video frame 10a (i.e., the first video frame) and the video frame 20 a.

It can be understood that, when calculating the similarity (i.e., the color similarity) between two images in the first video sequence (i.e., the video frame 10a and the video frame 20a), the present embodiment may first calculate the similarity between the color histograms of the two images, i.e., the present embodiment may refer to the color histogram of the first video frame (i.e., the aforementioned video frame 30a) in the target color space as a first histogram, and refer to the color histogram of the second video frame (i.e., the video frame 20a) in the target color space as a second histogram.

Among them, it can be understood that since the H component is equally divided into 12 blocks, the S component is equally divided into 5 blocks, and the V component is equally divided into 5 blocks in the target color space (e.g., HSV color space). In this embodiment, each partition may be referred to as an index parameter, and in order to distinguish the index parameters corresponding to the 3 color components, the index parameter corresponding to the H component may be referred to as the index parameter i, the index parameter corresponding to the S component may be referred to as the index parameter j, and the index parameter corresponding to the V component may be referred to as the index parameter k. In view of this, the embodiment of the present application may refer to a statistical probability value of the H component on the index parameter i (e.g., i ═ 1) in the first histogram as a first statistical probability value when determining a good target color component (e.g., H component), and may refer to a statistical probability value of the H component on the same index parameter i (e.g., i ═ 1) in the second histogram as a second statistical probability value, so that the first statistical probability value and the second statistical probability value of the index parameter i when taking a value of 1 may be compared to determine a minimum probability statistical value of the index parameter i of the H component when taking a value of 1 in the two statistical probability values. It can be understood that, under the condition that the index parameter i of the H component is at different values (i.e., any one value in the range of values of 1 to 12), the minimum probability statistic value corresponding to each index parameter can be obtained, so that the minimum cumulative probability value corresponding to the H component can be obtained.

It can be understood that, when the target user terminal obtains the minimum cumulative probability values corresponding to the three components in the HSV color space, the similarity between the first histogram and the second histogram of the two images can be determined according to the minimum cumulative probability values corresponding to the three color components, so as to indirectly determine the color similarity between the video frame 10a and the video frame 20 a.

The target user terminal may calculate the minimum cumulative probability value corresponding to the H component by the following formula (4):

in the formula (4), S_H(P1, Q1) is used to describe the color similarity between the H components of the aforementioned two images (i.e., the first video frame and the second video frame). Wherein P1 is used to represent the second video frame to be color similarity matched to the cluster centroid (i.e., cluster centroid 1), and Q1 is used to represent the cluster centroid (i.e., cluster centroid 1). Wherein H (i) is used to represent the statistical probability value (i.e. the aforementioned second statistical probability value) of the index parameter i of the H component in the second video frame, Q1_ H (i) is used to represent the statistical probability value (i.e. the aforementioned first statistical probability value) of the index parameter i of the H component in the first video frame; the minimum function min (h (i), Q1_ h (i)) is used to describe the minimum probability statistics corresponding to the same index parameter in the two images. Since the value of the index parameter i of the H component may be any one of 1 to 12, in the embodiment of the present application, under the condition of obtaining the minimum probability statistical values corresponding to the 12 index parameters of the H component,the minimum probability statistics values corresponding to the 12 index parameters are further accumulated to determine the minimum accumulated probability value corresponding to the H component, that is, the minimum accumulated probability value corresponding to the H component may be collectively referred to as the color similarity between the H components of the two images (e.g., the video frame 10a and the video frame 10b) in the embodiment of the present application.

By analogy, the target ue may calculate the minimum cumulative probability value corresponding to the S component according to the following formula (5):

in the formula (5), S_S(P1, Q1) is used to describe the color similarity between the S components of the aforementioned two images (i.e., the first video frame and the second video frame). Where P1 still represents the second video frame to be color similarity matched to the cluster centroid (i.e., cluster centroid 1) and Q1 still represents the cluster centroid (i.e., cluster centroid 1). Wherein S (j) is used to represent the statistical probability value (i.e. the second statistical probability value) of the index parameter j of the S component in the second video frame, and Q1_ S (j) is used to represent the statistical probability value (i.e. the first statistical probability value) of the index parameter j of the H component in the first video frame; the minimum function min (s (j), Q1_ s (j)) is used to describe the minimum probability statistics corresponding to the same index parameter in the two images. Since the value of the index parameter j of the S component may be any one of 1 to 5, in the embodiment of the present application, under the condition that the minimum probability statistics values corresponding to the 5 index parameters of the S component are obtained, the minimum probability statistics values corresponding to the 5 index parameters are further accumulated to determine the minimum accumulated probability value corresponding to the S component, that is, the embodiment of the present application may collectively refer to the minimum accumulated probability value corresponding to the S component as the color similarity between the S components of the two images (for example, the video frame 10a and the video frame 10 b).

By analogy, the target ue may calculate a minimum cumulative probability value corresponding to the V component by the following formula (6):

in the formula (6), S_V(P1, Q1) is used to describe the color similarity between the V components of the aforementioned two images (i.e., the first video frame and the second video frame). Where P1 still represents the second video frame to be color similarity matched to the cluster centroid (i.e., cluster centroid 1) and Q1 still represents the cluster centroid (i.e., cluster centroid 1). Wherein V (k) is used to represent the statistical probability value (i.e. the second statistical probability value) of the index parameter k of the V component in the second video frame, and Q1_ V (k) is used to represent the statistical probability value (i.e. the first statistical probability value) of the index parameter k of the V component in the first video frame; the minimum function min (v (k), Q1_ v (k)) is used to describe the minimum probability statistics corresponding to the same index parameter in the two images. Since the value of the index parameter k of the V component may be any one of 1 to 5, in the embodiment of the present application, under the condition that the minimum probability statistics values corresponding to the 5 index parameters of the V component are obtained, the minimum probability statistics values corresponding to the 5 index parameters are further accumulated to determine the minimum accumulated probability value corresponding to the V component, that is, the embodiment of the present application may collectively refer to the minimum accumulated probability value corresponding to the V component as the color similarity between the S components of the two images (for example, the video frame 10a and the video frame 10 b).

Therefore, in the process of comparing the similarity between two images, the embodiments of the present application may compare the similarity between the color histograms of the two images, that is, compare the similarity between three color components in the color histograms of the two images, in other words, the embodiments of the present application may obtain the weight corresponding to each color component after obtaining the minimum cumulative probability value corresponding to each of the 3 color components, so as to calculate the similarity between the first histogram and the second histogram of the two images by multiplying the minimum cumulative probability value corresponding to each color component and the weight corresponding to each color component, and then performing the summation process, and may further refer to the similarity between the first histogram and the second histogram of the two images as the color similarity between the first video frame and the second video frame, so that the cluster to which the video frame 20a in the first video sequence shown in fig. 4a belongs can be subsequently divided according to the color similarity (i.e., similarity) of the two images.

For example, if the color similarity between the first video frame and the second video frame is greater than or equal to the clustering threshold, the second video frame (e.g., video frame 20a) with the color similarity greater than or equal to the clustering threshold may be divided into the clustering cluster (e.g., clustering cluster 1 shown in fig. 4 b).

Optionally, if the color similarity between the first video frame and the second video frame is smaller than the clustering threshold, the first video frame (e.g., video frame 10a) may be updated based on the second video frame (e.g., video frame 30a shown in fig. 4 b) whose color similarity is smaller than the clustering threshold, and a cluster to which the updated first video frame belongs (i.e., cluster 2 to which the video frame 30a belongs) is created, and the updated first video frame (i.e., video frame 30a) is sequentially color-similarity-matched with the unmatched second video frames until all the video frames in the first video sequence are color-similarity-matched, the cluster to which the video frame in the first video sequence belongs may be output, specifically, the description of each cluster in the embodiments corresponding to fig. 4a and 4b may be referred to together, the description will not be continued here.

Step S203, taking the video frame matched with the key frame acquisition condition as a key video frame in the cluster;

specifically, the target user terminal may determine, in the target color space, an information entropy corresponding to the video frame in the cluster based on the respective cumulative probability values corresponding to each color component carried by the video frame in the cluster; further, the target user terminal may search the video frame with the maximum information entropy from the information entropy corresponding to the video frames in the cluster; further, the target user terminal may use the found video frame with the maximum information entropy as the key video frame acquired from the cluster.

Step S204, determining a second video sequence based on the key video frame;

step S205, determining a time interval between two adjacent key video frames in the second video sequence according to the playing time stamps of the key video frames in the first video sequence;

it can be understood that the playing time stamp of each video frame in the first video sequence described in the embodiment of the present application may be a time stamp during forward playing, that is, the playing may be started from the time stamp with the smallest playing time stamp during forward playing of the first video sequence until the playing progress of the first video sequence reaches the time stamp corresponding to the end point of the first video sequence. At this time, the playing timestamp of the first key video frame in the second video sequence also has the smallest timestamp. Optionally, the first video sequence may be played in a direction, that is, the playing may be started from a timestamp with a largest playing timestamp when the first video sequence is played in a reverse direction until the playing progress of the first video sequence reaches the timestamp corresponding to the starting point of the first video sequence. At this time, the playing timestamp of the first key video frame in the second video sequence also has the largest timestamp. For ease of understanding, the embodiment of the present application only exemplifies forward playing of the first video sequence, so as to further determine the time interval between two adjacent key video frames in the second video sequence according to the playing time stamp of each key video frame in the first video sequence.

step S206, controlling the playing duration of the first key video frame based on the time interval between the first key video frame and the second key video frame;

step S207, playing the first key video frame based on the playing duration of the first key video frame, and playing the second key video frame until the playing progress of the second video sequence reaches the playing timestamp of the second key video frame.

For a specific implementation manner of the step S204 to the step S207, reference may be made to the description of the step S102 to the step S104 in the embodiment corresponding to fig. 3, and details will not be further described here.

Optionally, in some scenes with a high requirement on system performance, for example, under the condition that the frame rate of the video frame cannot be changed, the display duration of the first key video frame may be controlled in another manner (for example, in a manner of frame supplement processing). For example, the embodiment of the present application may also create at least one complementary video frame that is the same as the first key video frame based on the time interval between the first key video frame and the second key video frame, for example, for the time interval between the timestamp T1 and the timestamp T2 shown in fig. 6, the key video frame 1 may be copied in the time interval to repeatedly display the key video frame 1, so that the display duration of the key video frame 1 may be indirectly controlled to perform frame complementary on the second video sequence based on the complementary video frames.

The second video sequence after frame supplementing can have the same number of video frames as the first video sequence, and at this time, the display duration of the first key video frame can be prolonged by the playing duration of each supplemented video frame in the embodiment of the present application. Further, the target user terminal may also play the second key video frame when the playing progress of the second video sequence reaches the playing time stamp of the second key video frame.

It can be understood that, by using the clustering key frame algorithm in the embodiment of the present application, memory optimization can be performed on a memory occupied by the first video sequence, so that when the optimized video data is played in a webpage or an application client, a phenomenon that a display effect of the video data suddenly drops is avoided, that is, after frame reduction processing, a rendering effect of the second video sequence obtained after frame reduction processing can be further adjusted to ensure a playing effect of the video data.

Further, please refer to fig. 8, which is a schematic structural diagram of a video data processing apparatus according to an embodiment of the present application. The video data processing apparatus 1 may be applied to the target user terminal, which may be the user terminal 3000a in the embodiment corresponding to fig. 1. Further, the video data processing apparatus 1 may include: a clustering module 10, a sequence determining module 20, an interval determining module 30 and a playing module 40;

the clustering module 10 is configured to perform clustering processing on video frames in a first video sequence to obtain a cluster associated with the first video sequence, and obtain a key video frame from the cluster; the number of the key video frames is the same as that of the clustering clusters;

wherein the clustering module 10 comprises: a spatial conversion unit 101, a clustering unit 102 and a key frame acquisition unit 103;

a spatial conversion unit 101, configured to obtain a first video sequence, and convert an initial color space associated with video frames in the first video sequence into a target color space;

a clustering unit 102, configured to perform clustering processing on video frames in the first video sequence in the target color space to obtain a cluster associated with the first video sequence;

wherein the clustering unit 102 comprises: a centroid determining subunit 1021, a polling subunit 1022, a dividing subunit 1023;

a centroid determining subunit 1021, configured to obtain a first video frame from the first video sequence as a clustering centroid;

a polling subunit 1022, configured to determine, in the first video sequence, video frames other than the first video frame as second video frames, and sequentially acquire the second video frames based on a polling mechanism;

a dividing unit 1023, configured to divide, in the target color space, the cluster to which the video frames in the first video sequence belong according to the color similarity between the first video frame and the second video frame.

Wherein the molecular dividing unit 1023 includes: a cluster creating subunit 1024, a matching subunit 1025, a first dividing subunit 1026, and a second dividing subunit 1027;

a cluster creating subunit 1024, configured to create a cluster to which the first video frame belongs;

a matching subunit 1025 for performing color similarity matching on the first video frame and the second video frame in the target color space;

wherein the matching subunit 1025 comprises: a histogram determination subunit 1041, a probability statistics subunit 1042 and a similarity determination subunit 1043;

a histogram determining subunit 1041, configured to determine a color histogram of the first video frame in the target color space as a first histogram, and determine a color histogram of the second video frame in the target color space as a second histogram; the target color space contains a plurality of color components;

a probability statistics subunit 1042, configured to determine a similarity between the first histogram and the second histogram based on the statistical probability value associated with each color component in the first histogram and the statistical probability value associated with each color component in the second histogram;

the probability statistics subunit 1042 includes: a component acquisition subunit 1051, a probability value determination subunit 1052, a probability value comparison subunit 1053, and a probability value accumulation subunit 1054;

a component acquiring subunit 1051, configured to acquire a target color component from each color component in the first histogram; the target color components are collectively represented by a plurality of index quantities in the target color space;

a probability value determining subunit 1052, configured to determine, in the first histogram, the statistical probability value of the target color component on each index parameter as a first statistical probability value associated with each index parameter, respectively, and determine, in the second histogram, the statistical probability value of the target color component on each index parameter as a second statistical probability value associated with each index parameter, respectively,

a probability value comparing subunit 1053, configured to perform a numerical comparison between the first statistical probability value associated with each index parameter and the second statistical probability value associated with the same index parameter, and determine the minimum probability statistical value corresponding to each index parameter according to the numerical comparison result;

a probability value accumulating subunit 1054, configured to determine a minimum accumulated probability value corresponding to the target color component based on the minimum probability statistics value respectively corresponding to each index parameter, and determine a similarity between the first histogram and the second histogram based on the minimum accumulated probability value corresponding to the target color component.

For specific implementation of the component obtaining subunit 1051, the probability value determining subunit 1052, and the probability value comparing subunit 1053 and the probability value accumulating subunit 1054, reference may be made to the description of the minimum accumulated probability value in the embodiment corresponding to fig. 7, which will not be described again here.

A similarity determination sub-sheet 1043, configured to determine a similarity between the first histogram and the second histogram as a color similarity between the first video frame and the second video frame.

The specific implementation manners of the histogram determining subunit 1041, the probability statistics subunit 1042, and the similarity determining subunit 1043 may be referred to in the description of the similarity between the first histogram and the second histogram in the embodiment corresponding to fig. 3, and details will not be further described here.

A first dividing unit 1026, configured to, if the color similarity between the first video frame and the second video frame is greater than or equal to a clustering threshold, divide the second video frame whose color similarity is greater than or equal to the clustering threshold into a clustering cluster to which the first video frame belongs;

a second partitioning subunit 1027, configured to, if the color similarity between the first video frame and the second video frame is smaller than the clustering threshold, update the first video frame based on the second video frame whose color similarity is smaller than the clustering threshold, create a cluster to which the updated first video frame belongs, and perform color similarity matching on the updated first video frame and unmatched second video frames in sequence, until all the video frames in the first video sequence complete the color similarity matching, output the cluster to which the video frame in the first video sequence belongs.

For specific implementation manners of the cluster creating subunit 1024, the matching subunit 1025, the first dividing subunit 1026, and the second dividing subunit 1027, reference may be made to the description of the color similarity between the first video frame and the second video frame in the embodiment corresponding to fig. 3, which will not be described again here.

For specific implementation manners of the centroid determining subunit 1021, the polling subunit 1022, and the dividing subunit 1023, reference may be made to the description of dividing the cluster in the embodiment corresponding to fig. 3, and details will not be further described here.

A key frame acquisition unit 103, configured to take a video frame matching the key frame acquisition condition as a key video frame in the cluster.

Wherein the key frame acquiring unit 103 includes: an information entropy determining subunit 1031, an information entropy searching subunit 1032, and a key frame determining subunit 1033;

an information entropy determining subunit 1031, configured to determine, in the target color space, information entropy corresponding to the video frames in the cluster based on the respective cumulative probability values corresponding to each color component carried by the video frames in the cluster;

wherein the information entropy determining subunit 1031 includes: an index amount obtaining sub-unit 1034, an accumulation sub-unit 1035, and a weighting sub-unit 1036;

an index quantity obtaining subunit 1034, configured to obtain an index parameter of each color component in the target color space;

an accumulation subunit 1035, configured to obtain a statistical probability value of the video frame in the cluster on the index parameter of each color component, and accumulate the statistical probability value on the index parameter of each color component to obtain an accumulated probability value corresponding to each color component;

a weighting subunit 1036, configured to determine an information entropy of the video frames in the cluster based on the accumulated probability value corresponding to each color component and the weight value corresponding to the corresponding color component.

For a specific implementation manner of the index obtaining sub-unit 1034, the accumulating sub-unit 1035, and the weighting sub-unit 1036, reference may be made to the description of the specific process for calculating the information entropy of the video frame 10a in the embodiment corresponding to fig. 3, and details will not be further described here.

An information entropy searching subunit 1032, configured to search the video frame with the largest information entropy from the information entropies corresponding to the video frames in the cluster;

a key frame determining subunit 1033, configured to use the found video frame with the largest information entropy as the key video frame obtained from the cluster.

The specific implementation manners of the information entropy determining subunit 1031, the information entropy searching subunit 1032, and the key frame determining subunit 1033 may refer to the description of the information entropy in the embodiment corresponding to fig. 3, which will not be described again here.

For specific implementation manners of the spatial conversion unit 101, the clustering unit 102, and the key frame obtaining unit 103, reference may be made to the description of step S101 in the embodiment corresponding to fig. 3, and details will not be further described here.

A sequence determination module 20 for determining a second video sequence based on the key video frames;

an interval determining module 30, configured to determine a time interval between two adjacent key video frames in the second video sequence according to the playing time stamps of the key video frames in the first video sequence;

and a playing module 40, configured to play the second video sequence based on a time interval between the two adjacent key video frames.

the playing module 40 includes: a duration control unit 401 and a play sub-unit 402;

a duration control unit 401, configured to control a playing duration of the first key video frame based on a time interval between the first key video frame and the second key video frame;

a playing subunit 402, configured to play the first key video frame based on the playing duration of the first key video frame, and play the second key video frame until the playing progress of the second video sequence reaches the playing timestamp of the second key video frame.

For specific implementation manners of the duration control unit 401 and the playing sub-unit 402, reference may be made to the description of playing the second video sequence in the embodiment corresponding to fig. 6, and details will not be further described here.

For specific implementation manners of the clustering module 10, the sequence determining module 20, the interval determining module 30, and the playing module 40, reference may be made to the description of steps S101 to S104 in the embodiment corresponding to fig. 3, and details will not be further described here.

It can be understood that the video data processing apparatus 1 in the embodiment of the present application can perform the description of the video data processing method in the embodiment corresponding to fig. 3 or fig. 7, which is not repeated herein. In addition, the beneficial effects of the same method are not described in detail.

Further, please refer to fig. 9, which is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 9, the computer device 1000 may be the user terminal 3000a in the embodiment corresponding to fig. 1. The computer device 1000 may include: the processor 1001, the network interface 1004, and the memory 1005, and the computer apparatus 1000 may further include: a user interface 1003, and at least one communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display) and a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a standard wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1004 may be a high-speed RAM memory or a non-volatile memory (e.g., at least one disk memory). The memory 1005 may optionally be at least one memory device located remotely from the processor 1001. As shown in fig. 9, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a device control application program.

The network interface 1004 in the computer device 1000 may also be in network connection with the server 2000 in the embodiment corresponding to fig. 1, and the optional user interface 1003 may also include a Display screen (Display) and a Keyboard (Keyboard). In the computer device 1000 shown in fig. 9, the network interface 1004 may provide a network communication function; the user interface 1003 is an interface for providing a user with input; and the processor 1001 may be used to invoke a device control application stored in the memory 1005 to implement:

determining a second video sequence based on the key video frames;

It should be understood that the computer device 1000 described in this embodiment of the present application may perform the description of the video data processing method in the embodiment corresponding to fig. 3 or fig. 7, and may also perform the description of the video data processing apparatus 1 in the embodiment corresponding to fig. 8, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.

Further, here, it is to be noted that: an embodiment of the present application further provides a computer storage medium, where the computer storage medium stores the aforementioned computer program executed by the video data processing apparatus 1, and the computer program includes program instructions, and when the processor executes the program instructions, the description of the video data processing method in the embodiment corresponding to fig. 3 or fig. 7 can be executed, so that details are not repeated here. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in the embodiments of the computer storage medium referred to in the present application, reference is made to the description of the embodiments of the method of the present application.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims

1. A method of processing video data, comprising:

determining a second video sequence based on the key video frames;

2. The method according to claim 1, wherein the video frames in the first video sequence are clustered to obtain cluster clusters associated with the first video sequence, and key video frames are obtained from the cluster clusters; the number of key video frames is the same as the number of cluster clusters, including:

3. The method of claim 2, wherein clustering video frames in the first video sequence in the target color space to obtain a cluster associated with the first video sequence comprises:

4. The method according to claim 3, wherein the dividing the cluster to which the video frames in the first video sequence belong according to the color similarity between the first video frame and the second video frame in the target color space comprises:

creating a cluster to which the first video frame belongs;

5. The method of claim 4, wherein the color similarity matching the first video frame with the second video frame in the target color space comprises:

6. The method of claim 5, wherein determining the similarity between the first histogram and the second histogram based on the statistical probability value associated with each color component in the first histogram and the statistical probability value associated with each color component in the second histogram comprises:

7. The method according to claim 2, wherein the using video frames matching key frame acquisition conditions as key video frames in the cluster comprises:

8. The method according to claim 7, wherein the determining, in the target color space, the information entropy of the video frames in the cluster based on the cumulative probability value corresponding to each color component carried by the video frames in the cluster comprises:

acquiring an index parameter of each color component in the target color space;

9. The method of claim 1, wherein the two adjacent key video frames comprise a first key video frame and a second key video frame;

10. A video data processing apparatus, comprising:

11. The apparatus of claim 10, wherein the clustering module comprises:

12. The apparatus of claim 11, wherein the clustering unit comprises:

13. The apparatus of claim 12, wherein the molecular scoring unit comprises:

14. A computer device, comprising: a processor, a memory;

the processor is coupled to the memory, wherein the memory is configured to store program code and the processor is configured to invoke the program code to perform the method of any of claims 1-9.

15. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions which, when executed by a processor, perform the method according to any one of claims 1-9.