WO2009138037A1

WO2009138037A1 - Video service system, video service apparatus and extracting method of key frame thereof

Info

Publication number: WO2009138037A1
Application number: PCT/CN2009/071783
Authority: WO
Inventors: 邸佩云; 胡昌启; 元辉; 马彦卓; 常义林
Original assignee: 华为技术有限公司
Priority date: 2008-05-13
Filing date: 2009-05-13
Publication date: 2009-11-19
Also published as: CN101582063A

Abstract

An extracting method of the key frame, a video service apparatus and a video service system are disclosed, in which the method is applied to extract the key frame of the video data stream in the video service system. The method comprises: obtaining the motion vector of each frame in the video data stream, and obtaining the characteristics vector aggregate according to the motion vector; determine whether the direction and the amplitude of the motion vector corresponding to the characteristics vector aggregate of the forward and the backward adjacent two frames occurs change or not; extracting the key frame using the determining result. So the frame whose speed changes abruptly could be extracted effectively.

Description

Instruction manual video service system, video service device and method for extracting key frames thereof

[1] This application claims the priority of the Chinese application filed on May 13, 2008, with the application number 200810067177.8, the invention titled "Video Service System, Video Service Device and Its Key Frame Extraction Method", all contents thereof This is incorporated herein by reference.

[2] Technical field

[3] The embodiments of the present invention relate to the field of communications technologies, and in particular, to a video service system, a video service device, and a method for extracting key frames thereof.

[4] Background Art

[5] People are observing the objective world, often being most interested in unusual events, and getting a lot of information from unusual events. For an object, the extraordinary state means that the motion state of the object changes significantly, for example, from rest to motion, from motion to rest, the direction of motion changes, or the speed of motion changes significantly. Similarly, when watching a video scene, people pay attention to the change of the scene, and the change of the video scene is a reflection of the significant change of the motion state of the object in the scene. The scene change also includes the switching of the scene. It can be considered that the switching of the scene is the sudden movement of the object in the original scene to the infinity, and the object in the new scene is moved from infinity, and the motion state of the object is intense. Variety.

[6] Generally, frames are used to describe video information, and key frames are frames that best represent video information.

. The so-called key frame refers to the frame in which the object in the scene has abnormal motion. The other frame scenes between the abnormal frames remain normal. The scene refers to a set of several shots containing content related.

[7] In the prior art method of extracting key frames, the interframe difference method is applied to the kth frame and the k-1th frame to obtain a rough outline of the moving object (referred to as a first contour), and then Using the multi-level edge detection algorithm to obtain the contour of all objects in the kth frame (referred to as the second contour), and the second contour is ANDed with the first contour to obtain a clearer contour than the first contour (referred to as the third Contour), then add a rectangular frame to the moving object based on the third contour, that is, the moving object is framed by a rectangle, and the motion is obtained by the Geodesic Active Contour Model in the Level Set Method. The edge contour of the object, finally by judging the appearance of the edge contour of the moving object , disappear, displacement changes, and shape changes to select keyframes.

[8] However, in the process of implementing the present invention, the inventors have found that at least the following problems exist in the above prior art 1.

First, in the prior art, the contour information of all moving objects needs to be extracted, and the contour information is calculated. Because the algorithm process of extracting the contour information is complicated, the calculation amount of the prior art 1 is large; The first technique is to extract the key frames from stationary to moving or from moving to stationary, but the key frames that are suddenly shifted by constant speed cannot be extracted.

[9] In the prior art method of extracting key frames, first, by extracting motion vector information of a moving object of each frame of the video stream, that is, a magnitude of a velocity, and obtaining an average value of motion vector information of each frame, The perceptual motion energy value is used to represent each motion vector information, and constitutes a perceptual motion energy map of the moving object of all frames, wherein the rising change of the perceptual motion energy value represents acceleration, the rising change represents deceleration, and then the perception is performed by the triangle model analyzer. The motion energy map divides the motion unit boundary, wherein the boundary corresponds to the minimum value of the perceived motion energy value, and the boundary identifies the start and end points of the triangle, and then performs triangle model adjustment for each motion unit, and selects a key frame.

[10] However, in the process of implementing the present invention, the inventors have found that at least the following problems exist in the above prior art: First, the prior art 2 does not consider the directional problem, that is, the prior art 2 cannot reflect the uniform speed but the moving direction occurs. a moving object that changes, so that the key frame of the moving object cannot be extracted; secondly, when there are multiple moving objects in the kth frame, the motion vector information of the individual moving objects is large, and the motion vectors of the individual moving objects are very Xiaoyan, after the average value calculation, may cause the perceived motion energy value of the kth frame to be small, which does not reflect the change of the kth frame well, which may lead to the misjudgment of the key frame, that is, the kth frame cannot be selected as Keyframe.

[11] Summary of the invention

An embodiment of the present invention provides a method for extracting a key frame, so as to solve the problem that the prior art solution cannot accurately extract a key frame that changes in a uniform speed but changes in the direction of motion.

[13] A key frame extraction method, including:

[14] According to the motion vector of each frame in the acquired video data stream; the feature vector set of the motion vector of each frame.

[15] determining whether the direction and amplitude of the motion vector corresponding to the feature vector set of two adjacent frames before and after the change; [16] Extracting key frames from the video data stream using the result of the determination as to whether the change has occurred.

[17] A video service device, comprising:

[18] a video key frame extraction module, configured to obtain, according to a motion vector of each frame in the acquired video data stream; a feature vector set of motion vectors of each frame; and a motion vector corresponding to the feature vector set of the adjacent two frames before and after Whether the direction and the amplitude change; extract the key frame from the video data stream by using the judgment result of whether the change occurs.

[19] A video service system, comprising: the video service device and the user terminal device, the video service device obtaining, according to a motion vector of each frame in the acquired video data stream; a feature vector set of motion vectors of each frame; Determining whether the direction and the amplitude of the motion vector corresponding to the feature vector set of the two adjacent frames before and after the change; extracting the key frame from the video data stream by using the determination result of whether the change occurs, and then, for the user The terminal device provides the key frame.

[20] The video service device, the video service system and the key frame extraction method thereof provide the feature vector set of the motion vector by using the motion vector of the frame, and determine the feature vector set corresponding to the two adjacent frames. Whether the direction and amplitude of the motion vector change to extract the key frame can effectively extract the frame with sudden change in speed and uniform velocity but change direction, which reduces the error rate and complexity of extracting key frames.

[21] BRIEF DESCRIPTION OF THE DRAWINGS

1 is a system diagram of a video service system according to Embodiment 1 of the present invention;

2 is a block diagram of a video service device according to Embodiment 1 of the present invention;

FIG. 3 is a block diagram of a video service apparatus according to Embodiment 2 of the present invention; FIG.

4 is a block diagram of a video service device according to Embodiment 3 of the present invention;

FIG. 5 is a flowchart of a method for extracting a key frame according to Embodiment 4 of the present invention; FIG.

6 is a histogram of an X-segment vector in a key frame extraction method according to Embodiment 4 of the present invention;

7 is a histogram of a y-divided vector in a key frame extraction method according to a fourth embodiment of the present invention.

[29] Specific embodiment

FIG. 1 is a system diagram of a video service system 10 according to Embodiment 1 of the present invention. In the present embodiment, the video service system 10 includes: a video service device 20 and a user terminal device 30. The video service device 20 and the user terminal device 30 are connected via a network (not shown) or the video service device 20 and the user terminal device 30. The peers are placed in the same video terminal device. In this embodiment, the video service apparatus 20 is configured to extract a key frame by determining whether a direction and an amplitude of a motion vector corresponding to a feature vector set of two adjacent frames in the video stream change, and extract the extracted key frame. The ranking is divided and provided to the user terminal device 30. In this embodiment, the video service device 20 can be a video retrieval service device or a video transmission service device or a video encoding service device.

FIG. 2 is a block diagram of a video service device 20 according to Embodiment 1 of the present invention. In the present embodiment, the video service device 20 is a video retrieval service device and is used to provide a video retrieval information service for the user terminal device 30. In this embodiment, the video service device 20 includes a video storage module 200 and a video key frame extraction module 210. The user terminal device 30 includes a key frame copy memory unit 300 and a user search and play interface 310. The video storage module 200, the key frame copy memory unit 300, and the user search and play interface 310 are all well-known technologies, and their functions are not described in detail.

In this embodiment, the key frame extraction module 210 is configured to acquire a motion vector of each frame in the video data stream, and acquire a feature vector set of the motion vector. In this embodiment, the key frame extraction module 210 composes a motion vector set by combining motion vectors having the same motion vector value, and a motion vector set having the largest number of motion vectors as a feature vector set. In the present embodiment, the video key frame extraction module 210 decomposes the motion vector into a sub-vector in the X-axis direction and a sub-vector in the y-axis direction. First extract the X-segment vector with the same value and the largest number of 吋 or the y-divided vector with the same extraction value and the largest number 吋; under the condition of the y-divided vector value corresponding to the X-divided vector value one by one, extract the most y Extracting the value of the partial vector or the value of the X-segment vector of the y-divided vector value one by one, extracting the value of the X-number vector with the largest number of 吋, and then taking the motion vector of this 为 as a set of (x, y) As a feature vector collection. In other embodiments of the present invention, the feature vector set may also be acquired by using a combination of amplitude and angle, and the motion vector set of the background and the foreground may be separately extracted by a clustering method, and the motion vector set of the foreground is a feature. Vector collection.

In the embodiment, the key frame extraction module 210 is further configured to extract a key frame by determining whether the direction and the amplitude of the motion vector corresponding to the feature vector set of the two adjacent frames are changed. In this embodiment, the key frame extraction module 210 extracts the key by determining whether the direction and amplitude of the motion vector of the feature vector set of the kth frame are different from the direction and magnitude of the motion vector of the feature vector set of the k-1th frame. frame.

[34] In this embodiment, the key frame extraction module 210 sets the direction of the motion vector corresponding to the feature vector set. It can be decomposed into the x-axis direction and the y-axis direction, and the magnitude of its motion vector can be expressed by the sum of the X-divided vector size and the y-divided vector size. In this embodiment, the key frame extraction module 210 determines the direction of the X-segment vector of the motion vector of the feature vector set of the k-th frame by determining the direction of the X-segment vector of the motion vector of the feature vector set of the k-th frame. The direction of the y-divided vector of the motion vector of the feature vector set of the k-frame is changed with respect to the direction of the y-divided vector of the motion vector of the feature vector set of the k-th frame, or by judging the motion of the feature vector set of the k-th frame The amplitude of the vector differs from the amplitude of the motion vector of the feature vector set of the k-1 frame by more than a predetermined threshold value, and the kth frame is used as the key frame.

In the embodiment, the key frame extraction module 210 is further configured to determine the category of the extracted key frame after extracting the key frame. The key frame categories are classified into a first type of key frame, a second type of key frame, and a third type of key frame. In this embodiment, the first type of key frames are excellent level key frames, the second type of key frames are good level key frames, and the third type of key frames are general level key frames. In this embodiment, the key frame extraction module 2 10 determines the extracted key frame category by determining whether the direction and the amplitude of the motion vector corresponding to the feature vector set of the two adjacent frames are changed, that is, the key extracted. The level of the frame.

[36] In this embodiment, the key frame extraction module 210 determines the direction of the X-segment vector of the motion vector of the feature vector set of the k-th frame relative to the X-segment vector of the motion vector of the feature vector set of the k-1th frame. The direction has changed, and the direction of the y-divided vector of the motion vector of the feature vector set of the k-th frame is changed from the direction of the y-divided vector of the motion vector of the feature vector set of the k-th frame,吋 By judging that the magnitude of the motion vector of the feature vector set of the kth frame differs from the magnitude of the motion vector of the feature vector set of the k-1 frame by more than a predetermined threshold, then k

The class of the frame is the first type of key frame, that is, the level of the k-th frame is divided into excellent levels.

[37] In this embodiment, the key frame extraction module 210 determines the direction of the X-segment vector of the motion vector of the feature vector set of the k-th frame relative to the X-segment vector of the motion vector of the feature vector set of the k-1th frame. The direction is changed, and the direction of the y-divided vector of the motion vector of the feature vector set of the k-th frame is changed by determining the direction of the y-divided vector of the motion vector of the feature vector set of the k-th frame, and By judging that the magnitude of the motion vector of the feature vector set of the kth frame is different from the magnitude of the motion vector of the feature vector set of the k-1 frame by no more than a predetermined threshold, then k

The category of the frame is the second type of key frame, that is, the level of the k-th frame is divided into good levels.

[38] In other embodiments of the present invention, the key frame extraction module 210 determines the feature vector set of the kth frame. The direction of the x-segment vector of the combined motion vector is changed with respect to the direction of the X-segment vector of the motion vector of the feature vector set of the k-1th frame, or by determining the motion vector of the feature vector set of the k-th frame The direction of the divided vector is changed with respect to the direction of the y-divided vector of the motion vector of the feature vector set of the k1th frame, and by determining the magnitude of the motion vector of the feature vector set of the k-th frame and the k-1th frame The magnitude of the motion vector of the feature vector set differs by more than a predetermined threshold, then the kth

In another embodiment of the present invention, the key frame extraction module 210 determines the X-score of the motion vector of the motion vector of the k-th frame by determining the direction of the X-segment vector of the motion vector of the feature vector set of the k-th frame. The direction of the vector changes, and by determining that the direction of the y-divided vector of the motion vector of the feature vector set of the k-th frame changes with respect to the direction of the y-divided vector of the motion vector of the feature vector set of the k1st frame, The category of the kth frame is the second type of key frame, that is, the level of the kth frame is divided into good levels.

[40] In the present embodiment, the key frame extraction module 210 determines the direction of the X-segment vector of the motion vector of the feature vector set of the k-th frame relative to the X-segment vector of the motion vector of the feature vector set of the k-1th frame. The direction has changed, or the direction of the y-divided vector of the motion vector of the feature vector set of the k-th frame is changed by the direction of the y-divided vector of the motion vector set of the k-th frame, and the judgment is made. The magnitude of the motion vector of the feature vector set of the kth frame is different from the amplitude of the motion vector of the feature vector set of the k-1th frame by no more than a predetermined threshold, then the kth

The class of the frame is the third type of key frame, that is, the level of the k-th frame is divided into a general level.

In another embodiment of the present invention, the key frame extraction module 210 determines the X-score of the motion vector of the motion vector of the k-th frame by determining the direction of the X-segment vector of the motion vector of the feature vector set of the k-th frame. The direction of the vector changes, or by determining that the direction of the y-divided vector of the motion vector of the feature vector set of the k-th frame changes with respect to the direction of the y-divided vector of the motion vector of the feature vector set of the k1st frame, The category of the kth frame is the third type of key frame, that is, the level of the kth frame is divided into general levels.

[42] In this embodiment, the video key frame extraction module 210 extracts a key frame from the video data stream of the video storage module 210, and transmits the classified key frame to the key frame of the user terminal device 30 for temporary storage. The unit 300 plays the key frame information for the user to search and play the interface 310.

[43] In this embodiment, the video key frame extraction module 210 classifies the key frame by determining whether the direction and the amplitude of the motion vector corresponding to the feature vector set of the two adjacent frames are changed. In the case of poor communication quality of the network, the non-key frames are discarded first. If the communication quality is further deteriorated, the key frames with lower levels are discarded, so that the information of interest to the user can be better protected.

FIG. 3 is a block diagram of a video service device 20 according to Embodiment 2 of the present invention. In this embodiment, the video service device 20 is a video transmission service device, and further includes a video collection module 220, a video encoding module 230, and a scalable network transmission module 240. In this embodiment, the video collection module 220 is connected to the video key frame extraction module 210 and the video encoding module 230, and the video encoding module 230 is connected to the video key frame extraction module 210, the scalable network transmission module 240, and the video encoding module 230. The scalable network transmission module 240 is coupled to the video encoding module 230, the video keyframe extraction module 210, and the video storage module 200.

In this embodiment, if the video service device 20 transmits the compressed video data stream, the video key frame extraction module 210 directly extracts the key frame from the compressed data stream transmitted by the video storage module 200, and then the key frame. The location and level information is sent to the scalable network transmission module 240 along with the compressed data stream. The scalable network transmission module 240 selects a corresponding protection policy according to the key frame information or a frame loss policy in the case of a limited rate to transmit the data stream.

[46] If the video service device 20 transmits the original video data stream, the key frame extraction module 210 extracts the key frame information from the original video data stream transmitted by the video collection module 220, and the video encoding module 230 works in the same manner. The original video data stream is encoded as a compressed video data stream and then passed to the scalable network transmission module 240 along with the key frame information.

4 is a block diagram of a video service device 20 according to Embodiment 3 of the present invention. In the present embodiment, the video service device 20 is a video encoding service device, further including a variable image (Group of

Picture, GOP) Group Video Coding Module 250. In this embodiment, the video collection module 220 is connected to the video key frame extraction module 210 and the variable GOP video encoding module 250, and the variable GOP video encoding module 250 and the video key frame extraction module 210, the video storage module 200, and the video. The collection module 220 and the scalable network transmission module 240 are connected. In this embodiment, the variable GOP video encoding module 250 encodes the key frame as an I frame, thereby implementing unequal length GOP encoding, which can improve encoding efficiency. Since the key frames are graded, when the two high-level key frames are far away, one or several low-level key frames can be inserted between them, so that the video of the random access engraving is not played. As for losing too many frames.

[48] In this embodiment, after the video key frame extraction module 210 extracts the key frame, the variable image group layer view The frequency encoding module 250 acts as an image group between every two key frames (Group of

The division of Picture, GOP) will make the code stream have robust code stream transmission characteristics, facilitate the unequal protection transmission in transmission, and convenient frame dropping strategy; and high compression efficiency and access characteristics, GOP internal The correlation is strong and the correlation between the two is easy to remove. The access point is a key frame and conforms to the characteristics of the human eye.

FIG. 5 is a flowchart of a method for extracting a key frame according to Embodiment 4 of the present invention.

[50] In step S300, a video data stream is received.

[51] In step S302, a motion vector of each frame is acquired from the video data stream. In this embodiment, the motion vectors of each frame are separately decomposed, and the decomposition can be selected by the coordinate axes, and each motion vector is decomposed into a sub-vector of the X direction and a sub-vector of the y direction, that is, each motion vector is available ( Xi, yi) to express.

[52] In step S304, a feature vector set of motion vectors for each frame is acquired. In this embodiment, motion vectors having the same motion vector value are grouped into a motion vector set, and a motion vector set having the largest number of motion vectors is used as a feature vector set.

[53] In this embodiment, the following is: first extracting an X-segment vector having the same value and the largest number of 吋, or a y-divided vector having the same extracted value and the largest number 吋; y-score corresponding to the X-segment vector value one-to-one Under the condition of the vector value, the value of the y-segment vector with the largest number of 吋 is extracted or the value of the X-segment vector with the largest number of 吋 is extracted under the condition that the y-divided vector value has a one-to-one corresponding X-segment vector value. In this embodiment, the method of establishing a one-dimensional histogram is used for explanation. First, analyze the vector of the X-direction of the motion vector, and establish a histogram of the vector of the X-direction, that is, a one-dimensional histogram. In this embodiment, the value of the X-segment vector of the same and the largest number of histogram values of the divided vector of the established X direction is represented by the expression xi_mo _S t , where i=l, ..., n, as shown in Figure 6. Then, the X-segment vector value of the motion vector is analyzed and the y-divided vector corresponding to xi_mo _S t is analyzed. In the histogram vector y-direction movement vectors of the establishment, at points X xi_mo _S t is the vector, to identify the maximum number of points y inch vector value yi_mo _S t, as shown in FIG. In this embodiment, a set having more motion vectors (xi_most, yi_most, i=l, ..., n) is called a feature vector set.

[54] In other embodiments, the value yi_mo _S t of the y-divided vector may also be extracted first, and then the xi_mo _S t of the x-divided vector may be extracted. In other embodiments, the feature vector set may also be acquired by using a combination of amplitude and angle, and the motion vector set of the background and the foreground may be extracted by a clustering method, and the motion vector set of the foreground is a feature vector set.

[55] In step S306, it is determined whether each frame is a key frame. In this embodiment, by judging the two adjacent Whether the direction and the amplitude of the motion vector corresponding to the feature vector set of the frame change determines whether each frame is a key frame. In this embodiment, it is determined whether the kth frame is the key by determining whether the direction and the amplitude of the motion vector of the feature vector set of the kth frame are different from the direction and the amplitude of the motion vector of the feature vector set of the k-1th frame. frame.

[56] If it is determined that the kth frame is not a key frame, continue to determine whether the k+1th frame is a key frame, that is, proceed to step S306; if it is determined that the kth frame is a key frame, extract the kth frame as a key frame, and Step S308 is performed.

[57] In this embodiment, specifically, the direction of the motion vector can be divided into an X-axis direction and a y-axis direction, and the magnitude of the motion vector is represented by a sum of the X-divided vector size and the y-divided vector size. For the X-segment vector, if the X value is positive, its direction is represented by ten, if it is 0, it is represented by 0, and if it is negative, it is represented by one. The same is true for the direction of the y-divided vector.

[58] In the present embodiment, if the direction of the X-segment vector of the motion vector of the feature vector set of the k-th frame is determined to be different from the direction of the X-segment vector of the motion vector of the feature vector set of the k-th frame, , that is, the direction of the X-divided vector changes from ten to one or from one to ten or from 0 to non-zero or from non-zero to 0, then the k-th frame is a key frame, and the k-th frame is extracted. As a keyframe. In other embodiments of the present invention, if the direction of the y-divided vector of the motion vector of the feature vector set of the k-th frame is changed, the direction of the y-divided vector of the motion vector of the k-th frame is changed, that is, The direction of the y-divided vector is changed from ten to one or from one to ten or from 0 to non-zero or from non-zero to 0, and the k-th frame is extracted as a key frame. In other embodiments of the present invention, if it is determined that the magnitude of the motion vector of the feature vector set of the kth frame differs from the amplitude of the motion vector of the feature vector set of the k-1 frame by more than a predetermined threshold, the kth frame is the key Frame, and extract the kth frame as a key frame.

In the present embodiment, the threshold value is 60. In other embodiments of the invention, the threshold value may also be other values.

[60] In step S308, it is determined whether the category of the key frame is the first type of key frame. In this embodiment, the categories of the key frames are divided into a first type of key frame, a second type of key frame, and a third type of key frame. Among them, the first type of key frames are excellent level key frames, the second type of key frames are good class key frames, and the third type of key frames are general class key frames. In the present embodiment, the category of the key frame is judged by judging whether the direction and the amplitude of the motion vector corresponding to the feature vector set of the two adjacent frames are changed, that is, the key frame is classified.

[61] If it is determined that the category of the key frame is the first type of key frame, step S316 is performed; otherwise, step S310 is performed. [62] In this embodiment, specifically: if it is determined that the direction of the X-segment vector of the motion vector of the feature vector set of the k-th frame is opposite to the direction of the X-segment vector of the motion vector of the feature vector set of the k1st frame The change, that is, the direction of the X-divided vector changes from ten to one or from one to ten, and the direction of the y-divided vector of the motion vector of the feature vector set of the k-th frame is relative to the feature vector of the k-1th frame The direction of the y-divided vector of the set motion vector changes, that is, the direction of the y-divided vector changes from ten to one or from one to ten, and the motion vector magnitude of the feature vector set of the k-th frame The magnitude of the motion vector magnitude of the feature vector set of the k-1th frame differs by more than a predetermined threshold, and the class of the kth frame is the first type of key frame, that is, the kth frame is divided into the superior key frame.

[63] In step S310, it is determined whether the category of the key frame is the second type of key frame. If it is determined that the category of the key frame is the second type of key frame, step S316 is performed; otherwise, step S312 is performed.

[64] In this embodiment, specifically: if it is determined that the direction of the X-segment vector of the motion vector of the feature vector set of the k-th frame is opposite to the direction of the X-segment vector of the motion vector of the feature vector set of the k1st frame The change, that is, the direction of the X-segment vector is changed from ten to one or from one to ten, and the direction of the y-divided vector of the motion vector of the feature vector set of the k-th frame is determined relative to the feature of the k-1 frame The direction of the y-divided vector of the motion vector of the vector set changes, that is, the direction of the y-divided vector changes from ten to one or from one to ten, and the motion vector of the feature vector set of the k-th frame is determined. The magnitude of the amplitude is different from the magnitude of the motion vector magnitude of the feature vector set of the k-1 frame by no more than a predetermined threshold, and the category of the kth frame is the second type of key frame, that is, the kth frame is divided into good key frames. .

[65] In other embodiments of the present invention, if the direction of the X-segment vector of the motion vector of the feature vector set of the k-th frame is changed, the direction of the X-segment vector of the motion vector of the feature vector set of the k-th frame is changed. , that is, the direction of the X-segment vector is changed from ten to one or from one to ten, and the magnitude of the motion vector magnitude of the feature vector set of the k-th frame and the motion of the feature vector set of the k-1 frame are determined. If the magnitude of the vector amplitude differs by more than a predetermined threshold, then the category of the kth frame is the second type of key frame, that is, the kth frame is divided into good key frames.

[66] In other embodiments of the present invention, if the direction of the y-divided vector of the motion vector of the feature vector set of the k-th frame is changed, the direction of the y-divided vector of the motion vector of the feature vector set of the k-th frame is changed. , that is, the direction of the y-divided vector is changed from ten to one or from one to ten, and the magnitude of the motion vector magnitude of the feature vector set of the k-th frame and the motion of the feature vector set of the k-1 frame are determined. The magnitude of the vector magnitude is different If the threshold is greater than the predetermined threshold, the category of the kth frame is the second type of key frame, that is, the kth frame is divided into good key frames.

[67] In other embodiments of the present invention, if the direction of the X-segment vector of the motion vector of the feature vector set of the k-th frame is changed, the direction of the X-segment vector of the motion vector of the feature vector set of the k-th frame is changed. , that is, the direction of the X-segment vector changes from 0 to non-zero or from non-zero to 0, and the direction of the y-divided vector of the motion vector of the feature vector set of the k-th frame is determined relative to the feature vector set of the k-1th frame. The direction of the y-divided vector of the motion vector changes, that is, the direction of the y-divided vector changes from 0 to non-zero or from non-zero to 0, then the category of the k-th frame is the second type of key frame, that is, the k-th frame is divided. A good grade keyframe.

[68] In step S312, it is determined whether the category of the key frame is a third type of key frame. If it is determined that the category of the key frame is the third type of key frame, step S316 is performed; otherwise, step S314 is performed.

[69] In this embodiment, specifically: if it is determined that the direction of the X-segment vector of the motion vector of the feature vector set of the k-th frame is opposite to the direction of the X-segment vector of the motion vector of the feature vector set of the k1st frame The change, that is, the direction of the X-segment vector is changed from ten to one or from one to ten, and the magnitude of the motion vector magnitude of the feature vector set of the k-th frame is determined with the feature vector set of the k-1 frame The magnitude of the motion vector magnitude differs by no more than a predetermined threshold, and the category of the kth frame is a third type of key frame, that is, the kth frame is divided into general level key frames.

[70] In other embodiments of the present invention, if the direction of the y-divided vector of the motion vector of the feature vector set of the k-th frame is changed, the direction of the y-divided vector of the motion vector set of the k-th frame is changed. , that is, the direction of the y-divided vector is changed from ten to one or from one to ten, and the magnitude of the motion vector magnitude of the feature vector set of the k-th frame and the motion of the feature vector set of the k-1 frame are determined. If the magnitude of the vector amplitude differs by no more than a predetermined threshold, then the category of the kth frame is a third type of key frame, that is, the kth frame is divided into general level key frames.

[71] In other embodiments of the present invention, if the direction of the X-segment vector of the motion vector of the feature vector set of the k-th frame is changed, the direction of the X-segment vector of the motion vector of the feature vector set of the k-th frame is changed. , that is, the direction of the X-segment vector changes from 0 to non-zero or from non-zero to 0, or the direction of the y-divided vector of the motion vector of the feature vector set of the k-th frame is determined relative to the feature vector set of the k-1th frame The direction of the y-divided vector of the motion vector changes, that is, the direction of the y-divided vector changes from 0 to non-zero or from non-zero to 0 吋, then the category of the k-th frame is the third type of key frame, that is, the k-th frame Divided into general level keyframes. [72] In step S314, the undivided key frame is transmitted to the user terminal device 30.

[73] In step S316, the key frame of the category is transmitted to the user terminal device 30.

The "predetermined threshold value" in the above embodiment of the present invention may be a constant value or a value that varies depending on the scene.

[75] The video service device 20, the video service system 10, and the key frame extraction method thereof are provided by using the motion vector of the frame to obtain a feature vector set of the motion vector, and determining the feature vector set of the adjacent two frames before and after. Whether the direction and amplitude of the corresponding motion vector change to extract key frames, thereby effectively extracting frames with sudden changes in speed and uniform velocity but changing direction, reducing the error rate and complexity of extracting key frames, and reducing the amount of calculation; Similarly, by classifying the key frames by using the direction and magnitude of the motion vector, those non-key frames can be discarded first in the network communication quality difference. If the communication quality is further deteriorated, the key frames with lower levels are discarded. Better protect user interest information.

[76] Through the description of the above embodiments, those skilled in the art can clearly understand that the present invention can be implemented by hardware, or can be realized by means of software plus necessary general hardware platform, the present invention. The technical solution can be embodied in the form of a software product, which can be stored in a computer readable storage medium (which can be a CD-ROM, a USB flash drive, a mobile hard disk, etc.), and includes a plurality of instructions for making a computer device (may be a personal computer, server, or network device, etc.) Perform the methods described in various embodiments of the present invention.

The above is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or within the technical scope disclosed by the present invention. Alternatives are intended to be covered by the scope of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

Claim

A method for extracting a key frame, comprising:

Obtained according to the motion vector of each frame in the acquired video data stream; the characteristic vector of the motion vector of each frame is Zhu Zhu A mouth;

Determining whether the direction and amplitude of the motion vector corresponding to the set of feature vectors of two adjacent frames before and after are changed;

The key frame is extracted from the video data stream using the result of the determination as to whether the change has occurred.

The key frame extraction method according to claim 1, wherein the obtaining a feature vector set of a motion vector of each frame according to a motion vector of each frame in the acquired video data stream comprises: respectively acquiring the video data stream The motion vector having the same motion vector value in the motion vector of each frame constitutes at least one motion vector set of each frame, and the motion vector set containing the largest number of motion vectors in each frame is used as the feature vector set of the motion vector of each frame. .

The key frame extraction method according to claim 2, wherein the obtaining the feature vector set of the motion vector of each frame according to the motion vector of each frame in the acquired video data stream further comprises: acquiring the acquired video data stream The motion vector of each frame is decomposed into an X-divided vector in the X-axis direction and a y-divided vector in the y-axis direction;

Extracting the X-segment vector with the same value and the largest number of 每 in each frame, and extracting the value of the y-divided vector with the largest number of 吋, under the condition of the y-divided vector value of the X-segment vector value one by one, The X-segment vector of 吋 and the y-divided vector of the largest number of y are composed of the feature vector of the motion vector of each frame, or the y-divided vector with the same value and the largest number of 每 in each frame is extracted, and the y-divided vector value is one by one Under the condition of the corresponding X-segment vector value, the value of the X-segment vector with the largest number of 吋 is extracted, and the eigenvector vector of the motion vector of each frame is composed of the X-segment vector with the largest number of 吋 and the y-divided vector with the largest number of 吋. The key frame extraction method according to any one of claims 1 to 3, wherein the direction and the amplitude of the motion vector corresponding to the feature vector set of the two adjacent frames before and after the determination are changed, Determining whether the change has occurred, extracting a key from the video data stream The X-segment vector direction of the corresponding motion vector of the feature vector set is changed or if the y-divided vector direction of the motion vector corresponding to the feature vector set of the k-th frame is determined corresponding to the feature vector set of the k-1 frame The y-divided vector direction of the motion vector changes, and the k-th frame is extracted as a key frame; or

If it is determined that the magnitude of the motion vector corresponding to the feature vector set of the kth frame differs from the amplitude of the motion vector corresponding to the feature vector set of the k-1th frame by more than a predetermined threshold value, extracting the kth frame is Key frame, the magnitude of the motion vector is represented by the sum of the X-segment vector size and the y-segment vector size.

[5] The method for extracting a key frame according to any one of claims 1 to 3, wherein the method further comprises:

Determining the category of the extracted key frame by determining whether the direction and the amplitude of the motion vector corresponding to the feature vector set of the two adjacent frames before and after the change, wherein the key frame class is classified into the first type key frame, the second Class keyframes and third class level keyframes.

[6] The method for extracting a key frame according to claim 5, wherein the determining whether the direction and the amplitude of the motion vector corresponding to the feature vector set of the two adjacent frames before and after the change are changed The key frame categories include:

If it is determined that the X-segment vector direction of the motion vector corresponding to the feature vector set of the k-th frame changes with respect to the X-segment vector direction of the motion vector corresponding to the feature vector set of the k-1th frame, and the k-th frame The y-divided vector direction of the motion vector corresponding to the feature vector set is changed with respect to the y-divided vector direction of the motion vector corresponding to the feature vector set of the k-1th frame, and the feature vector set of the k-th frame is And determining, by the amplitude of the corresponding motion vector, that the amplitude of the motion vector corresponding to the feature vector set of the k-1th frame is greater than a predetermined threshold, determining that the kth frame is the first type of key frame;

Or

If it is determined that the direction of the X-segment vector of the motion vector corresponding to the feature vector set of the k-th frame is changed with respect to the direction of the X-segment vector of the motion vector corresponding to the feature vector set of the k-1th frame, and the kth The direction of the y-divided vector of the motion vector corresponding to the feature vector set of the frame changes with respect to the direction of the y-divided vector of the motion vector corresponding to the feature vector set of the k-1th frame, and Determining the kth when the magnitude of the motion vector corresponding to the feature vector set of the kth frame is different from the amplitude of the motion vector corresponding to the feature vector set of the k-1th frame by a predetermined threshold The frame is the second type of key frame;

Or

If it is determined that the direction of the X-segment vector of the motion vector corresponding to the feature vector set of the k-th frame is changed with respect to the direction of the X-segment vector of the motion vector corresponding to the feature vector set of the k-1th frame, or the k-th frame The direction of the y-divided vector of the motion vector corresponding to the feature vector set is changed with respect to the direction of the y-divided vector of the motion vector corresponding to the feature vector set of the k1st frame, and the feature vector set of the k-th frame corresponds to The amplitude of the motion vector is different from the amplitude of the motion vector corresponding to the feature vector set of the k-1th frame by more than a predetermined threshold, and the kth frame is determined to be the second type of key frame;

Or

If it is determined that the direction of the X-segment vector of the motion vector corresponding to the feature vector set of the k-th frame is changed with respect to the direction of the X-segment vector of the motion vector corresponding to the feature vector set of the k-1th frame, and the kth The direction of the y-divided vector of the motion vector corresponding to the feature vector set of the frame changes with respect to the direction of the y-divided vector of the motion vector corresponding to the feature vector set of the k-1th frame, and then the kth is determined The frame is the second type of key frame.

Or

If it is determined that the direction of the X-segment vector of the motion vector of the feature vector set of the k-th frame is changed with respect to the direction of the X-segment vector of the motion vector of the feature vector set of the k-th frame, or the k-th frame The direction of the y-divided vector of the motion vector of the feature vector set changes with respect to the direction of the y-divided vector of the motion vector of the feature vector set of the k-th frame, and the motion vector of the feature vector set of the k-th frame The amplitude of the motion vector of the feature vector set of the k-1th frame is not more than a predetermined threshold, and the kth frame is determined to be a third type of key frame; or

If it is determined that the direction of the X-segment vector of the motion vector corresponding to the feature vector set of the k-th frame changes with respect to the direction of the X-segment vector of the motion vector corresponding to the feature vector set of the k-1th frame, or k The direction of the y-divided vector of the corresponding motion vector of the frame's feature vector set is relative to the k-1th If the direction of the y-divided vector of the motion vector corresponding to the feature vector set of the frame changes, the k-th frame is determined to be the third-type key frame.

[7] A video service device, comprising:

a video key frame extraction module, configured to obtain, according to a motion vector of each frame in the acquired video data stream; a feature vector set of motion vectors of each frame; and determine a direction of a motion vector corresponding to the feature vector set of two adjacent frames before and after Whether the amplitude changes; extracting a key frame from the video data stream using the judgment result as to whether the change has occurred.

[8] The video service device according to claim 7, wherein:

The video key frame extraction module is configured to respectively compose at least one motion vector set of each frame by a motion vector with the same motion vector value in a motion vector of each frame in the acquired video data stream, and include motion vectors in each frame. a set of motion vectors having the largest number of motion vectors as a feature vector set of motion vectors of each frame, wherein the direction of the motion vector is divided into an X-axis direction and a y-axis direction, and the magnitude of the motion vector is determined by the X-segment vector size and y-minute The sum of vector sizes is used to represent.

[9] The video service device according to claim 8, wherein:

The video key frame extraction module is configured to determine that an X-segment vector direction of a motion vector corresponding to a feature vector set of the k-th frame is generated in a direction of an X-segment vector of a motion vector corresponding to a feature vector set of the k-th frame The direction of the y-divided vector of the motion vector corresponding to the feature vector set corresponding to the k-th frame is changed or determined, and the y-divided vector direction corresponding to the feature vector set of the k-1 frame is changed. The k frame is a key frame;

Or,

Extracting the kth frame by determining that the magnitude of the motion vector corresponding to the feature vector set of the kth frame differs from the amplitude of the motion vector corresponding to the feature vector set of the k-1 frame by more than a predetermined threshold value Keyframe.

[10] The video service device according to claim 7 or 8 or 9, wherein the video key frame extraction module is further configured to: after extracting the key frame, determine the feature vector set of the adjacent two frames before and after Determining whether the direction and magnitude of the corresponding motion vector change to determine the class of the extracted key frame, wherein the key frame class is classified into a first type key frame, a second type key frame, and a third type key frame. [11] The video service device of claim 10, wherein:

The video key frame extraction module is further configured to determine an X-segment vector direction of a motion vector corresponding to a feature vector set of the k-1 frame by determining a X-vector direction of a motion vector corresponding to the feature vector set of the k-th frame. a change has occurred, and the y-divided vector direction of the motion vector corresponding to the feature vector set of the k-th frame changes with respect to the y-divided vector direction of the motion vector corresponding to the feature vector set of the k-1th frame, And determining that the k is determined by a difference between a magnitude of a motion vector corresponding to the feature vector set of the _kth frame and a magnitude of a motion vector corresponding to the feature vector set of the _k -1th frame being greater than a predetermined threshold threshold The frame is the first type of key frame;

Or

The video key frame extraction module is further configured to determine an X-division vector of a motion vector corresponding to a feature vector set of the k-1 frame by determining a direction of an X-segment vector of a motion vector corresponding to the feature vector set of the k-th frame The direction of the change occurs, and the direction of the y-divided vector of the motion vector corresponding to the feature vector set of the k-th frame is opposite to the direction of the y-divided vector of the motion vector corresponding to the feature vector set of the k-1th frame. a change occurs, and a magnitude of a motion vector corresponding to the feature vector set of the kth frame is different from a magnitude of a motion vector corresponding to the feature vector set of the k-1 frame by a predetermined threshold 吋, Determining that the kth frame is a second type of key frame; or

The video key frame extraction module is further configured to determine an X-division vector of a motion vector corresponding to a feature vector set of the k-1 frame by determining a direction of an X-segment vector of a motion vector corresponding to the feature vector set of the k-th frame The direction of the y-divided vector of the motion vector corresponding to the feature vector set of the k-th frame changes with respect to the direction of the y-divided vector of the motion vector corresponding to the feature vector set of the k-1th frame, And determining that the k is determined by a difference between a magnitude of a motion vector corresponding to the feature vector set of the _kth frame and a magnitude of a motion vector corresponding to the feature vector set of the _k -1th frame being greater than a predetermined threshold threshold The frame is the second type of key frame;

Or

The video key frame extraction module is further configured to determine an X-division vector of a motion vector corresponding to a feature vector set of the k-1 frame by determining a direction of an X-segment vector of a motion vector corresponding to the feature vector set of the k-th frame The direction of the change has occurred, and the feature vector set of the kth frame corresponds to the transport The direction of the y-divided vector of the motion vector changes with respect to the direction of the y-divided vector of the motion vector corresponding to the feature vector set of the k-1th frame, and then the kth frame is determined to be the second type of key frame. or;

The video key frame extraction module is further configured to determine a direction of an X-segment vector of a motion vector of a motion vector set of the k-th frame by determining a direction of an X-segment vector of a motion vector of the feature vector set of the k-th frame A change has occurred, or the direction of the y-divided vector of the motion vector of the feature vector set of the k-th frame is changed with respect to the direction of the y-divided vector of the motion vector of the feature vector set of the k-1th frame, and Determining that the kth frame is the third type of key, the amplitude of the motion vector of the feature vector set of the kth frame is different from the amplitude of the motion vector of the feature vector set of the k-1th frame by a predetermined threshold. frame;

Or

The video key frame extraction module is further configured to: determine, by determining a direction of an X-segment vector of a motion vector corresponding to the feature vector set of the k-th frame, a motion vector X corresponding to a feature vector set of the k-1th frame The direction of the sub-vector changes, or the direction of the y-divided vector of the motion vector corresponding to the feature vector set of the k-th frame occurs in the direction of the y-divided vector of the motion vector corresponding to the feature vector set of the k-1th frame. If the change is made, the kth frame is judged to be a third type of key frame.

A video service system comprising the video service device and the user terminal device according to any one of claims 7 to 11, wherein the video service device obtains according to a motion vector of each frame in the acquired video data stream; The feature vector set of the vector; determining whether the direction and the amplitude of the motion vector corresponding to the feature vector set of the adjacent two frames before and after the change; and extracting the key frame from the video data stream by using the determination result of whether the change occurs, The key frame is then provided for the user terminal device.