WO2015078134A1 - 视频分类的方法和装置 - Google Patents

视频分类的方法和装置 Download PDF

Info

Publication number
WO2015078134A1
WO2015078134A1 PCT/CN2014/075510 CN2014075510W WO2015078134A1 WO 2015078134 A1 WO2015078134 A1 WO 2015078134A1 CN 2014075510 W CN2014075510 W CN 2014075510W WO 2015078134 A1 WO2015078134 A1 WO 2015078134A1
Authority
WO
WIPO (PCT)
Prior art keywords
motion
video
phrase
library
sample
Prior art date
Application number
PCT/CN2014/075510
Other languages
English (en)
French (fr)
Inventor
王利民
乔宇
黎伟
许春景
汤晓鸥
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP14866346.1A priority Critical patent/EP3067831A4/en
Publication of WO2015078134A1 publication Critical patent/WO2015078134A1/zh
Priority to US15/167,388 priority patent/US10002296B2/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7847Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
    • G06F16/786Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content using motion, e.g. object motion or camera motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/48Matching video sequences
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes

Definitions

  • the present invention relates to the field of electronic information technology, and in particular, to a method and apparatus for video classification.
  • a moving atom is a simple motion pattern with some commonalities, then calculating the response of the video to be detected and these moving atoms, forming a response into a vector, and classifying the detected video according to the obtained vector.
  • Embodiments of the present invention provide a method and apparatus for video classification that can improve the accuracy of video classification.
  • an embodiment of the present invention provides a video classification method, including: forming a motion atom set, the sample video library includes at least one video, and the motion atom set utilizes the motion atom set and the segmentation result Generating a description vector corresponding to the video in the sample video library;
  • a video to be detected of the same type as the video in the sample video library is determined.
  • the generating, by using the set of motion atoms and the segmentation result, a description vector corresponding to a video in the sample video library including:
  • the sample video library includes at least two videos, and the types of videos in the sample video library are the same.
  • the method further includes:
  • r( , ) is the response of the one motion phrase Pi to the video in the sample video library
  • r(F, ⁇ ) min ⁇ ) ⁇ ( ⁇ , ⁇ )
  • v(V, ⁇ ) max Score( (V, t') , A)-N ⁇ t' ⁇ t, ⁇ )
  • OR t refers to the calculation of the sample video
  • S(,c) represents a set of videos in the sample video library that is the most responsive to the one motion phrase
  • c is the sample video library
  • the identifier of the type of the video in the ⁇ ( ) is the video feature of the segmentation result starting with the video in the sample video library, &.
  • ⁇ ( ⁇ ( A) is to input ⁇ ( , t') into the support vector machine
  • i, ) is the Gaussian distribution with t as the mean and ⁇ as the standard deviation.
  • ⁇ (0 refers to a neighborhood centered at t
  • Selecting the motion phrase to obtain a selected result includes:
  • the motion phrases in the motion phrase set are in descending order according to the values of Rep( , c) + ARepSet( , c) Sort and use the previous motion phrase as the first choice!
  • is a positive integer greater than 1;
  • the motion phrase has n motion atoms, and then obtains an nth screening result according to the motion phrase in the n-1th screening result, and the nth screening result is according to Rep( church, c) + ARepSet( profession, c)
  • the value is from the first m n motion phrases arranged in order of d, m n is a positive integer greater than or equal to 1, the motion phrase in the nth screening result has n motion atoms, and n is a positive integer greater than or equal to 1;
  • the description vector is generated based on the first to nth screening results.
  • the sample video library includes at least two videos, and the sample video library includes at least two types of video; And generating, according to the selection result, a description vector corresponding to the video in the sample video library, including:
  • the determining, by using the description vector, determining that the type of the video in the sample video library is the same The video to be tested includes:
  • the determining, by using the description vector, determining that the type of the video in the sample video library is the same Detect video including:
  • the second classification rule is configured to detect whether the to-be-detected video is the same as the type of the video in the sample video library;
  • the method further includes:
  • an embodiment of the present invention provides an apparatus for video classification, including: obtaining a segmentation result, and generating a motion atom set, where the sample video library includes at least one video, a second generating module, configured to generate, by using the set of motion atoms and the segmentation result, a description vector corresponding to a video in the sample video library;
  • a classification module configured to determine, by using the description vector, a to-be-detected video of the same type as the video in the sample video library.
  • the second generating module includes:
  • a first generating unit configured to generate, according to the motion atom set and the segmentation result, a motion phrase set corresponding to a video in the sample video library, where the motion phrase set includes at least two motion phrases, one motion The phrase includes moving atoms that occur near a point in time in a certain order;
  • a screening unit configured to select the motion phrase, and obtain a selected result
  • a second generating unit configured to generate, according to the selected result, a description vector corresponding to the video in the sample video library.
  • the sample video library includes at least two videos, and the types of videos in the sample video library are the same .
  • the motion phrase in the motion phrase set includes a motion atom in the motion atom set;
  • the second generation module also includes:
  • a first acquiring unit configured to acquire a moving atomic unit ⁇ ( ⁇ , ⁇ ), and according to the moving original
  • A is a moving atom
  • t is the time point in the video in the sample video library
  • is the standard deviation of the Gaussian distribution
  • V is the video in the sample video library
  • ? 1 is the one motion phrase
  • r( , ) is the response of the one motion phrase ⁇ to the video in the sample video library
  • r(F, ?) min ⁇ ) ⁇ ( ⁇ , ⁇ )
  • c is an identifier of a type of video in the sample video library, and is in the video in the sample video library The video feature of the segmentation result starting with t', &.
  • ⁇ ( ⁇ ( , A) is the score obtained by inputting ( ⁇ , t') to the support vector machine S VM classifier, N ⁇ f I ⁇ ) is the Gaussian distribution with t as the mean and ⁇ as the standard deviation, ⁇ (0 means t a neighborhood of the center;
  • a second acquiring unit configured to acquire a coverage parameter RepSet(,c) of the one motion phrase, and obtain the one motion phrase pair according to the coverage parameter RepSet(r,c) of the one motion phrase The contribution value of the coverage parameter ARepSet( , c) ,
  • the selection unit includes:
  • a screening subunit configured to, according to a representative parameter and a contribution value of each motion phrase in the set of motion phrases, according to a value of Rep( , c) + ARepSet( ⁇ , c) from the order of d to d Sorting the motion phrases in the set of motion phrases, and using the previous motion phrase as the first screening result, being a positive integer greater than or equal to 1;
  • a first generating subunit configured to generate the description vector according to the first to nth screening results.
  • the sample video library includes at least two videos, and the sample video library includes at least two types of video;
  • the second generating unit includes:
  • a collection subunit configured to obtain a screening result set according to a screening result of the motion phrase corresponding to different types of videos in the sample video library
  • a second generation subunit configured to generate, according to the selected result set, a description vector corresponding to the video in the sample video library.
  • the classification module includes:
  • a third generating unit configured to generate a response vector corresponding to the to-be-detected video
  • a third acquiring unit configured to acquire the description vector corresponding to each different type of video in the sample video library, and according to the description vector Obtaining a first classification rule, where the first classification rule is used to determine a type of the video to be detected
  • a first classification unit configured to determine, according to the first classification rule and the response vector, that the type of the to-be-detected video is the same as one of types of videos included in the sample video library, and The video classification to be detected.
  • the classification module includes:
  • a fourth generating unit configured to generate a response vector corresponding to the to-be-detected video; And obtaining a second classification rule, where the second classification rule is configured to detect whether the to-be-detected video is the same as the type of the video in the sample video library;
  • a detecting unit configured to detect whether a response vector of the to-be-detected video meets the second classification rule
  • a second classification unit configured to determine, when the content is consistent, that the video to be detected is the same as the type of the video in the sample video library.
  • the method further includes:
  • An acquiring module configured to acquire at least one component of a response vector of the to-be-detected video, and obtain a main motion phrase according to the at least one component, where the main motion phrase is a motion phrase corresponding to the at least one component;
  • a display module configured to acquire and display a key frame of the to-be-detected video, where the response of the key frame and each of the main motion phrases is the largest.
  • a method and apparatus for video classification provided by an embodiment of the present invention can segment a video in a sample video library to generate a motion atom, and generate a description vector of the video in the sample video library by using the segmentation result and the motion atom.
  • the description vector determines the video to be detected that is the same as the video type in the sample video library, thereby achieving the purpose of video classification.
  • the scheme for obtaining the vector corresponding to the video to be detected according to the motion atom is as shown in FIG. 1a. Since the motion atom does not contain the time factor, the timing relationship between the motion atoms of the continuous complex motion cannot be represented.
  • the invention generates a motion phrase according to the motion atom, and generates a description vector according to the motion phrase.
  • the motion phrase includes motion atoms occurring in a certain order near the time point, and is used to describe the motion of the continuous complex motion between the atoms.
  • the timing relationship for example:
  • the SVM classifier is used to classify the video to be detected, and the scheme of the present invention is as shown in FIG.
  • the video is decomposed into simple segments by time. Since the time setting points of the decomposed segments are different, the video classification results are different, so it is difficult to properly decompose the continuous complex motion into segments composed of simple motions. Result in inaccurate classification results.
  • the present invention derives a description vector according to a motion phrase for describing a time series relationship between motion atoms of successive complex motions, so that the description vector is reflected in the form of quantized data in a continuous complex motion at a time point follow the time
  • the moving atoms arranged by the order relationship, and thereby detecting the degree of matching of the motion phrase with the video in the sample video library. Therefore, the process of classifying by using the description vector realizes the time factor including the video in the classification process, and also includes the motion atom for representing the specific action and content in the video, and combines the two to generate continuous complexity.
  • FIG. 1 a is an example flow chart of a method for video classification in the prior art
  • FIG. 1b is a flowchart of an example of a method for video classification provided by the present invention.
  • FIG. 1 is a flowchart of a video classification method according to an embodiment of the present invention
  • FIG. 2 is a flowchart of a specific implementation manner of a video classification method according to an embodiment of the present invention
  • FIG. 3 is a flowchart of another specific implementation manner of a video classification method according to an embodiment of the present disclosure
  • FIG. 3b is a flowchart of still another specific implementation manner of a video classification method according to an embodiment of the present disclosure
  • Figure 3c is a schematic diagram showing an example of a method for performing video classification according to an embodiment of the present invention
  • Figure 4a is a flowchart of still another specific implementation manner of a method for video classification according to an embodiment of the present invention
  • FIG. 4b is a schematic diagram showing an example of displaying main information in a video according to an embodiment of the present disclosure
  • FIG. 5 is a schematic structural diagram of an apparatus for video classification according to an embodiment of the present invention
  • FIG. 6 is a specific implementation manner of an apparatus for video classification according to an embodiment of the present invention
  • FIG. 7 is a schematic structural diagram of another apparatus for video classification according to an embodiment of the present invention
  • FIG. 8 is a schematic structural diagram of another apparatus for performing video classification according to an embodiment of the present invention
  • FIG. 9 is a schematic structural diagram of another specific implementation manner of another apparatus for video classification according to an embodiment of the present disclosure.
  • FIG. 10 is a schematic structural diagram of still another specific implementation manner of another apparatus for video classification according to an embodiment of the present disclosure.
  • FIG. 11 is a schematic structural diagram of another apparatus for video classification according to an embodiment of the present invention
  • FIG. 12 is a schematic structural diagram of a video classification system according to an embodiment of the present invention.
  • the technical solution provided by the embodiment of the present invention can generate a set of motion atoms according to the motion information in the video to be detected, and finally obtain a description vector of the video in the sample video library, and classify the detected video by using the description.
  • the scheme can roughly divide the videos to be detected into large categories, such as music videos, sports videos, or dance videos.
  • the video can also be divided into small categories, such as sprint video, high jump video, or long jump. Video, etc.
  • An embodiment of the present invention provides a method for video classification, as shown in FIG. 1c, including: and generating a set of motion atoms.
  • the video in the sample video library can be selected according to the needs of the user classification, for example: the user wants to divide the video to be detected into three types: dance video, drama video, and sports video, then you can select dance video, drama video, sports. Video these three types of video into the sample view
  • the frequency library as a video in the sample video library.
  • Into the sample video library as a video in the sample video library.
  • the sample video library includes at least one video, and the motion atoms in the motion atom set are generated according to the video in the sample video library.
  • the system divides each video in the sample video library into equal-length video clips, and there is a certain time overlap between adjacent video clips, for example: Video clip 1 is a clip of 00:00:00-00:01:00 in the video.
  • the video clip 2 adjacent to the video clip 1 is a clip of 00: 00: 30-00: 01: 30 in the video.
  • the system extracts low-level video features for each video segment.
  • the degree parameter d Z ⁇ ) is the mean value of the Euclidean distance between all vectors, indicating the K-dimensional component.
  • the system uses the clustering algorithm to form motion atoms according to the similarity parameters of the low-level video features.
  • the clustering algorithm can be a neighbor propagation algorithm.
  • the set of moving atoms is obtained from the moving atoms.
  • the moving atoms in the set of moving atoms occur in a certain time sequence, and can form a motion phrase, and use the motion phrase and the video in the sample video library to calculate the response, and the obtained response value is composed into the description vector of the video in the sample video library. Thereby achieving the purpose of quantifying the content of the video.
  • the video classification rule may be formed by using the description vector of the video in the sample video library, and the video to be detected is classified by determining which type of video in the sample video library is the same as the video to be detected.
  • a video classification method provided by an embodiment of the present invention can segment a video in a sample video library to generate a motion atom, and generate a description vector of the video in the sample video library by using the segmentation result and the motion atom, and use the description vector. , to determine the video to be detected with the same video type as in the sample video library, so as to achieve the purpose of video classification.
  • the present invention derives a description vector according to a motion phrase for describing a time series relationship between motion atoms of successive complex motions, so that the description vector is reflected in the form of quantized data in a continuous complex motion at a time point A nearby moving atom arranged in a time series relationship, and thereby detecting the degree of matching of the motion phrase with the video in the sample video library. Therefore, the process of classifying by using the description vector realizes the time factor including the video in the classification process, and also includes the motion atom for representing the specific action and content in the video, and combines the two to generate continuous complexity.
  • a motion phrase of a time-series relationship between moving motion atoms, and a description vector generated from a motion phrase thereby enabling accurate classification of videos including long-term continuous complex motion.
  • the embodiment of the present invention further provides a specific solution of the video classification method, and further refines the execution process of 102 in FIG. 1 , where 102 may be specific.
  • the implementation is 1021 - 1023, as shown in Figure 2, including:
  • the motion phrase set includes at least two motion phrases, and one motion phrase includes motion atoms occurring in a certain order near a time point, and the motion phrase can represent a temporal relationship between motion atoms.
  • the motion response in the screening result is compared with the video in the sample video library, and the obtained response value is composed into a description vector of the video in the sample video library, thereby achieving the purpose of quantizing the content of the video.
  • a video classification method provided by an embodiment of the present invention can segment a video in a sample video library to generate a motion atom, and generate a motion phrase of a video in the sample video library by using the segmentation result and the motion atom, and The phrase is selected, and according to the selected result, a description vector is generated, and the description vector is used to determine the video to be detected that is the same as the video type in the sample video library, thereby achieving the purpose of video classification.
  • the present invention derives a description vector according to a motion phrase for describing a time series relationship between motion atoms of successive complex motions, such that the description vector is reflected in the form of quantized data in continuous complex motion, at time, A moving atom arranged in a time series relationship, and thereby detecting the degree of matching of the motion phrase with the video in the sample video library. Therefore, using the description vector to classify the process, the time factor including the video is included in the classification process, and the motion atom for representing the specific action and content in the video is also included, and the combination is used to describe the continuous complexity.
  • the motion phrase of the temporal relationship between the moving motion atoms, selecting the motion phrase, the motion phrase in the selected result has good representation, coverage and discriminability, reducing the number of motion phrases required to generate the description vector, so that The resulting description vector is more streamlined, reduces the time to generate the description vector, and enables accurate classification of videos that include long-term continuous complex motion.
  • the embodiment of the present invention further provides a specific solution of the video classification method, which is increased by 1024-1025 in the execution process of the 1022 refinement in FIG. 2,
  • the execution process of 1022 and 103 in FIG. 2 is further refined.
  • the 1022 may be specifically implemented as 10221 - 10224, and 103 may be specifically implemented as 103 l a - 1034a.
  • the method includes:
  • is the standard deviation of the Gaussian distribution
  • V is the video in the sample video library
  • a motion phrase and this one motion phrase Pi includes one moving atom in the set of moving atoms
  • r( , ) is a The response of the video in the motion phrase sample video library
  • Max v(7, r) represents the OR operation in the motion phrase, or the operation refers to calculating the response of the same type of video in the sample video library and the motion atomic unit in the motion phrase of the time zone, and selecting the time to be adjacent
  • the response value of the region's most responsive moving atomic unit; min maxv(7, r) represents the minimum value of the response in the motion phrase, and the response of the motion atomic unit with the largest response selected in the operation or operation.
  • the minimum value is greater than the preset threshold, it indicates that the motion phrase matches the video in the sample video library.
  • OR is OR
  • AND is AND operation
  • the time of the unit 2 is located in the adjacent area, and the time of the moving atomic unit 3 and the moving atomic unit 4 is located in the adjacent area, and the moving atomic unit 1 and the moving atomic unit 2 are operated or operated, and the response of the moving atomic unit 1 is greater than that of the moving atomic unit 2 Response, select the response value of the moving atomic unit 1, while performing or operating on the moving atomic unit 3 and the moving atomic unit 3, the motion of the moving atomic unit 4 Should be greater than the response of the moving atomic unit 3, select the response value of the moving atomic unit 4, and then compare the response of the moving atomic unit 1 with the response of the moving atomic unit 4, and select the minimum of the response of the moving atomic unit 1 and the response of the moving atomic unit 4.
  • the response value; S(,c) represents the set of videos in the sample video library with the largest response to a motion phrase, c is the identifier of the type of video in the sample video library, and is the starting point in the video in the sample video library.
  • the video feature of the segment result, &. ⁇ ( ⁇ ( , ⁇ is the score obtained by inputting ⁇ ( , t') into the support vector machine SVM classifier, N(Z' , ) is the Gaussian distribution with t as the mean and ⁇ as the standard deviation, ⁇ (0 means centered on t a neighborhood.
  • the representative parameters require the motion phrase to have as much anti-reflection as possible for a certain type of video. Should, indicates that the motion phrase is representative of this type of video.
  • the discriminative parameter Disil of the motion phrase P for a certain type of video indicates the difference between the representation of the motion phrase for a certain type of video and other types of video, and the larger the discriminative parameter, the motion The better the discriminative performance of the phrase,
  • ⁇ RepSet( , c) RepSet (r ⁇ , c) - RepSet ( ⁇ - ⁇ ⁇ , c) , the number of fragments obtained for the video segment identified as c in the sample video library, which is a set of motion phrases. And the identifier of the video type to which the motion atom contained in one motion phrase belongs is c.
  • coverage requires that the set of motion phrases generated by the selected motion phrases can cover each type of video as much as possible.
  • each of the motion phrases in the set of motion phrases is executed 104-105, and representative parameters and contribution values of each of the motion phrases are obtained.
  • a positive integer greater than or equal to 1 may be a value set by the system according to the type and quantity of the video in the sample video library, or may be set and input by the user.
  • the system can add the first screening result from the moving atom extracted from the set of moving atoms.
  • the traversal method is used to generate a new motion phrase with 2 motion atoms, and the two motion atoms in the generated new motion phrase do not occur at the same time point.
  • the motion phrase in the set of motion phrases includes one motion atom in the set of motion atoms, and the first screening result is obtained through 10221, and then 10222 is obtained, and a new motion phrase with two motion atoms is obtained, and then 10221 is passed.
  • the process selects a new motion phrase with 2 motion atoms, obtains the second selection result, and then obtains a new motion phrase with 3 motion atoms through the process of 10222, and so on, until the nth screening result is obtained.
  • the nth screening result is the first 13 ⁇ 4 motion phrases arranged according to the value of Rep( , c) + ARepSet( , c) from the order of d to d, and m n is a positive integer greater than or equal to 1, the nth screening result
  • the motion phrase has n motion atoms, n is a positive integer greater than or equal to 1, and n can be a value set by the system according to the type and number of videos in the sample video library, and can also be set and input by the user.
  • the motion phrase in the first screening result includes one motion atom in the motion atom set
  • the motion phrase in the second screening result includes two motion atoms in the motion atom set
  • the motion phrase in the nth screening result includes n moving atoms in a set of moving atoms.
  • the set of the selected motion phrases obtained in 10224 is used as a base, and a response vector corresponding to the video to be detected is generated, and the components in the response vector are the operations in the first to nth screening results.
  • the response of the phrase to the detected video is used as a base, and a response vector corresponding to the video to be detected is generated, and the components in the response vector are the operations in the first to nth screening results. The response of the phrase to the detected video.
  • 1032a Obtain a second classification rule according to a description vector corresponding to each video in the sample video library.
  • the sample video library includes at least two videos, and the types of videos in the sample video library are the same.
  • a second classification rule can be generated, for example, using SVM (Support Vector Machine) classifier to classify, and the description vector of the video in the obtained sample video library is added to the SVM classifier, and the SVM classifier will A classification rule is generated, and the classification rule may be a second classification rule, where the second classification rule is used to detect whether the video to be detected is the same as the type of the video in the sample video library.
  • SVM Small Vector Machine
  • the second classification rule generated in 1032a is used to detect the response vector of the video to be detected, thereby determining whether the video to be detected is the same as the type of the video in the sample library. The same type.
  • the sample video library includes at least two videos, and the types of the videos in the sample video library are the same. If the response vector of the video to be detected conforms to the second classification rule, determining the type of the video to be detected and the video in the sample video library. The types are the same; if the response vector of the video to be detected does not comply with the second classification rule, it is determined that the type of the video to be detected is different from the type of the video in the sample video library, thereby classifying the detected video.
  • the sample video library includes five videos, and the types of the five videos are all dance videos. The type of the video to be detected is detected as a dance class. The video to be detected is classified, and the video to be detected can be divided into a dance video and There are two types of non-dance videos.
  • a video classification method provided by an embodiment of the present invention is capable of segmenting a video in a sample video library, generating a motion atom, and generating a motion phrase of the video in the sample video library by using the segmentation result and the motion atom, for each
  • the motion phrase calculates the contribution values of the representative parameters and the coverage parameters, first generates a motion phrase including a motion atom, and selects a motion phrase with good representativeness and coverage according to the contribution values of the representative parameters and the coverage parameters, First screening As a result, a motion atom is added to the motion phrase in the first screening result to obtain a new motion phrase, and then the new motion phrase is selected according to the contribution values of the representative parameter and the coverage parameter to obtain the second screening.
  • the process is repeated, until the nth screening result is obtained, the description vector is generated according to the first to nth screening results, and the second classification rule is generated by using the description vector, and the response vector of the video to be detected is obtained, and the detection is performed. It is detected whether the type of the video is the same as the type of the video in the sample video library, so as to achieve the purpose of video classification.
  • the present invention derives a description vector according to a motion phrase for describing a time series relationship between motion atoms of successive complex motions, so that the description vector is reflected in the form of quantized data in a continuous complex motion at a time point A moving atom arranged in a time series relationship, and thereby detecting the degree of matching of the motion phrase with the video in the sample video library. Therefore, the process of classifying by using the description vector realizes the time factor including the video in the classification process, and also includes the motion atom for representing the specific action and content in the video, and combines the two to generate continuous complexity.
  • a motion phrase for the temporal relationship between moving motion atoms, screening motion phrases, and the motion phrases in the screening results have good representation, coverage, and discriminability, reducing the number of motion phrases required to generate the description vector,
  • the resulting description vector is more streamlined, reduces the time to generate the description vector, and enables accurate classification of videos that include long-term continuous complex motion.
  • the embodiment of the present invention further provides a specific solution of the video classification method, and further refines the execution process of 1023 and 103 in FIG. 2, where 1023 can be specifically implemented as 10231 - 10232, and 103 can be specifically implemented as 1031b - 1033b, as shown in FIG. 3b, including:
  • the sample video library includes at least two videos, and the sample video library includes at least two types of videos.
  • Each type of video in the sample video library has a corresponding first to nth screening result, and the first to nth screening results corresponding to different types of videos in the sample video library are combined to obtain a screening result set, and the screening result is obtained. Collections include all the different types in the sample video library The video corresponds to the motion phrase.
  • the motion phrase in the screening result set is used as a base to generate a description vector corresponding to the video in the sample video library, and each video in the sample video library has a corresponding description vector, and each component in the description vector is a sample.
  • the motion phrase in the screening result set obtained in 10232 is used as a base to generate a response vector corresponding to the video to be detected, and the component in the response vector is the first to nth screening results corresponding to different types of videos in the sample video library.
  • the motion phrase treats the response of the detected video.
  • the sample video library includes at least two videos, and the sample video library includes at least two types of videos.
  • Generating a first classification rule according to the description vector corresponding to each different type of video in the sample video library for example: using SVM (Support Vector Machine) classifier to classify, and different types in the obtained sample video library
  • SVM Small Vector Machine
  • the description vector of the video is added to the SVM classifier, and the SVM classifier generates a classification rule.
  • the classification rule may be a first classification rule, and the first classification rule is used to determine the type of the video to be detected.
  • the sample video library includes at least two types of videos, and the first classification rule is used to determine the type of the video to be detected, for example, the sample video library includes three types of videos, namely, dance video, sports video, Acrobatic video, using the SVM (Support Vector Machine) classifier to classify the detected video, the first classification rule is generated in 1032b, and the response vector of the to-be-detected video obtained in 103 lb is added to the SVM classification. According to the first classification rule, the SVM classifier divides the video to be detected into one of three categories: dance video, sports video, and acrobatic video.
  • SVM Small Vector Machine
  • a video classification method provided by an embodiment of the present invention is capable of segmenting a video in a sample video library, generating a motion atom, and generating a motion phrase of the video in the sample video library by using the segmentation result and the motion atom, for each
  • the motion phrase calculates the contribution values of the representative parameters and the coverage parameters, first generates a motion phrase including a motion atom, and selects a motion phrase with good representativeness and coverage according to the contribution values of the representative parameters and the coverage parameters, In the first screening result, a motion atom is added to the motion phrase in the first screening result to obtain a new motion phrase, and then the new motion phrase is selected according to the contribution values of the representative parameter and the coverage parameter to obtain
  • the second screening result and so on, repeats the process until the nth screening result is obtained, and the first to nth screening results corresponding to different types of videos in the sample library are combined to obtain a screening result set, and generated according to the selected result set.
  • Descriptive vector using the description vector to generate the first classification rule, One type of the same type to be detected in response vector of the video, the video to be detected to determine the type of the sample video library includes a video, the video to achieve the purpose of classification.
  • the present invention derives a description vector according to a motion phrase for describing a time series relationship between motion atoms of successive complex motions, so that the description vector is reflected in the form of quantized data in a continuous complex motion at a time point A moving atom arranged in a time series relationship, and thereby detecting the degree of matching of the motion phrase with the video in the sample video library.
  • the process of classifying by using the description vector realizes the time factor including the video in the classification process, and also includes the motion atom for representing the specific action and content in the video, and combines the two to generate continuous complexity.
  • the motion phrase of the temporal relationship between the moving motion atoms, the motion phrases are filtered, and the motion phrases in the screening results have good representation, coverage and discriminability, reducing the number of motion phrases required to generate the description vector,
  • the resulting description vector is more streamlined, reduces the time to generate the description vector, and enables accurate classification of multiple different types of videos including long-term continuous complex motion.
  • the embodiment of the present invention further provides a specific solution for the video classification method, which is added to 104-105, and can extract and display the to-be-detected video.
  • the main information includes:
  • the component in the response vector of the video to be detected may be the response of the selected motion phrase to the detected video.
  • the main motion phrase is a motion phrase corresponding to at least one component, for example, the response vector of the video to be detected has 10 components, and the 10 components are arranged in descending order, the first 3 components are obtained, and the The first three components correspond to the motion phrase, and the first three components correspond to the motion phrase, especially the main motion phrase.
  • the key frame and each of the main motion phrases have the largest response, so the key frame can represent the most important information in the video to be detected, and the system can display the key frame in addition to the key frame of the video to be detected.
  • a frame thereby presenting the main content including the motion in the video to be detected, for example: as shown in FIG. 4b, in the continuous 9 frames of the long jump action in one video, the process of 104-105 can be known
  • the key frames are the second frame and the sixth frame, and the frames near the key frame and the key frame are displayed, so the first to third frames and the fifth to seventh frames are displayed.
  • a video classification method provided by an embodiment of the present invention can segment a video in a sample video library to generate a motion atom, and generate a motion phrase of a video in the sample video library by using the segmentation result and the motion atom, and The phrase is selected, and according to the selected result, a description vector is generated, and the description vector is used to determine the video to be detected that is the same as the video type in the sample video library, thereby achieving the purpose of video classification, and also according to the component in the response vector of the video to be detected. Get the main motion phrase to get and display the keyframe.
  • the present invention derives a description vector according to a motion phrase for describing a time series relationship between motion atoms of successive complex motions, so that the description vector is reflected in the form of quantized data in a continuous complex motion at a time point A nearby moving atom arranged in a time series relationship, and thereby detecting the degree of matching of the motion phrase with the video in the sample video library. Therefore, using the description vector to classify the process, the classification process is realized. It includes both the time factor of the video, the motion atom used to represent the specific action and content in the video, and the combination of the two to generate a motion phrase for describing the temporal relationship between the moving atoms of the continuous complex motion.
  • the motion phrases are selected, and the motion phrases in the selected results have good representation, coverage and discriminability, which reduces the number of motion phrases needed to generate the description vector, makes the obtained description vector more streamlined, and reduces the generation of description vectors.
  • Time and can accurately classify videos including long-term continuous complex motions; at the same time, the components in the response vector of the video to be detected can be used to obtain and display key frames of the video to be detected, and the main content of the video to be detected is Clearly and concisely presented, allowing users to quickly understand the main content of the video.
  • the embodiment of the present invention further provides a device 200 for video classification.
  • the method includes: a first generating module 201, configured to segment a video in a sample video library in time sequence, and obtain a segmentation result. And generate a collection of motion atoms.
  • the sample video library includes at least one video, and the motion atoms in the motion atom set are generated according to the video in the sample video library.
  • the second generation module 202 is configured to generate, by using the set of motion atoms and the segmentation result, a description vector corresponding to a video in the sample video library.
  • the classification module 203 is configured to determine, by using the description vector, a to-be-detected video of the same type as the video in the sample video library.
  • a device for video classification is capable of segmenting a video in a sample video library to generate a motion atom, and generating a description vector of the video in the sample video library by using the segmentation result and the motion atom, and using the description vector. , to determine the video to be detected with the same video type as in the sample video library, so as to achieve the purpose of video classification.
  • the present invention derives a description vector according to a motion phrase for describing a time series relationship between motion atoms of successive complex motions, so that the description vector is reflected in the form of quantized data in a continuous complex motion at a time point A nearby moving atom arranged in a time series relationship, and thereby detecting the degree of matching of the motion phrase with the video in the sample video library. Therefore, the process of classifying by using the description vector realizes the time factor including the video in the classification process, and also includes the specific action and content used to represent the video.
  • Motion atoms, and in combination with them generate motion phrases for describing the temporal relationship between motion atoms of successive complex motions, and description vectors generated from motion phrases, thereby enabling accurate video including long-term continuous complex motions classification.
  • the second generating module 202 includes:
  • the first generating unit 2021 is configured to generate a motion phrase set corresponding to the video in the sample video library according to the motion atom set and the segmentation result.
  • the set of motion phrases includes at least two motion phrases, and one motion phrase includes motion atoms occurring in the vicinity of the time point in a certain order.
  • the sample video library includes at least two videos, and the types of videos in the sample video library are the same.
  • the screening unit 2022 is configured to filter the motion phrase and obtain a screening result.
  • the second generating unit 2023 is configured to generate, according to the selection result, a description vector corresponding to the video in the sample video library.
  • a device for video classification is capable of segmenting a video in a sample video library, generating a motion atom, and generating a motion phrase of the video in the sample video library by using the segmentation result and the motion atom, and The phrase is selected, and according to the selected result, a description vector is generated, and the description vector is used to determine the video to be detected that is the same as the video type in the sample video library, thereby achieving the purpose of video classification.
  • the present invention derives a description vector according to a motion phrase for describing a time series relationship between motion atoms of successive complex motions, so that the description vector is reflected in the form of quantized data in a continuous complex motion at a time point A moving atom arranged in a time series relationship, and thereby detecting the degree of matching of the motion phrase with the video in the sample video library. Therefore, using the description vector to classify the process, the time factor including the video is included in the classification process, and the motion atom for representing the specific action and content in the video is also included, and the combination is used to describe the continuous complexity.
  • the motion phrase of the temporal relationship between the moving motion atoms, selecting the motion phrase, the motion phrase in the selected result has good representation, coverage and discriminability, reducing the number of motion phrases required to generate the description vector, so that The resulting description vector is more streamlined and reduces the time it takes to generate the description vector, and can The video of continuous and complex motion of time is accurately classified.
  • the second generating module 202 further includes:
  • the first obtaining unit 2024 is configured to acquire a moving atomic unit ⁇ ( ⁇ , ⁇ ), and obtain a representative parameter Rep, c) of a motion phrase according to the moving atomic unit.
  • Re(D( t'; ) is the score obtained by inputting ⁇ ( , ⁇ ') into the support vector machine SVM classifier, N 'l ⁇ ) is the Gaussian distribution with t as the mean and ⁇ as the standard deviation, ⁇ ( 0 refers to a neighborhood centered on t.
  • the motion phrase in the set of motion phrases includes a motion primitive in a set of motion atoms.
  • a second obtaining unit 2025 configured to acquire a coverage parameter RepSet(,c) of the one motion phrase, and obtain the motion phrase pair according to the coverage parameter RepSet(r,c) of the one motion phrase The contribution value of the coverage parameter ARepSet( , c). among them,
  • ⁇ RepSet( , c) RepSet ( ⁇ , c) - RepSet ( — ⁇ , c) , the number of fragments obtained for the video segment identified as c in the sample video library, a set of motion phrases, and a motion phrase
  • the identifier of the video type to which the included motion atom belongs is c.
  • the screening unit 2022 includes:
  • a screening subunit 20221 configured to perform, according to a representative parameter and a contribution value of each motion phrase in the set of motion phrases, in descending order of values of Rep( , c) + ARepSet( , c)
  • the motion phrases in the set of motion phrases are sorted, and the previous motion phrase is used as the first screening result, and is a positive integer greater than or equal to 1.
  • the adding subunit 20222 is configured to extract a motion atom from the set of motion atoms and add the motion phrase in the first selection result, so that the motion phrase in the first selection result has two motion atoms.
  • the first generation subunit 20223 is configured to generate the description vector according to the first to nth screening results.
  • An apparatus for video classification is capable of segmenting a video in a sample video library, generating a motion atom, and generating a motion phrase of the video in the sample video library by using the segmentation result and the motion atom, for each
  • the motion phrase calculates the contribution values of the representative parameters and the coverage parameters, first generates a motion phrase including a motion atom, and selects a motion phrase with good representativeness and coverage according to the contribution values of the representative parameters and the coverage parameters, In the first screening result, a motion atom is added to the motion phrase in the first screening result to obtain a new motion phrase, and then the new motion phrase is selected according to the contribution values of the representative parameter and the coverage parameter to obtain
  • the second screening result and so on, repeats the process until the nth screening result is obtained, according to the first to nth screening results, a description vector is generated, and the description vector is used to generate a second
  • the classification rule obtains a response vector of the video to be detected, and detects whether the type of the video to be
  • the present invention derives a description vector according to a motion phrase for describing a time series relationship between motion atoms of successive complex motions, so that the description vector is reflected in the form of quantized data in a continuous complex motion at a time point A moving atom arranged in a time series relationship, and thereby detecting the degree of matching of the motion phrase with the video in the sample video library. Therefore, the process of classifying by using the description vector realizes the time factor including the video in the classification process, and also includes the motion atom for representing the specific action and content in the video, and combines the two to generate continuous complexity.
  • a motion phrase for the temporal relationship between moving motion atoms, screening motion phrases, and the motion phrases in the screening results have good representation, coverage, and discriminability, reducing the number of motion phrases required to generate the description vector,
  • the resulting description vector is more streamlined, reduces the time to generate the description vector, and enables accurate classification of videos that include long-term continuous complex motion.
  • the second generating unit 2023 includes:
  • the collection subunit 20231 is configured to obtain a screening result set according to a screening result of the motion phrase corresponding to different types of videos in the sample video library.
  • the sample video library includes at least two videos, and the sample video library includes at least two types of videos.
  • the second generation subunit 20232 is configured to generate, according to the set of the screening results, a description vector corresponding to the video in the sample video library.
  • An apparatus for video classification is capable of segmenting a video in a sample video library, generating a motion atom, and generating a motion phrase of the video in the sample video library by using the segmentation result and the motion atom, for each
  • the motion phrase calculates the contribution values of the representative parameters and the coverage parameters, first generates a motion phrase including a motion atom, and selects a motion phrase with good representativeness and coverage according to the contribution values of the representative parameters and the coverage parameters, In the first screening result, a motion atom is added to the motion phrase in the first screening result to obtain a new motion phrase, and then according to the contribution values of the representative parameter and the coverage parameter in the obtained new motion phrase.
  • Screening obtaining the second screening result, and so on, repeating the process until the nth screening result is obtained, and the first to nth screening results corresponding to different types of videos in the sample library are combined to obtain a screening result set, and according to Selecting a result set to generate a description vector, using the description vector, generating a first classification rule, obtaining a response vector of the video to be detected, and determining that the type of the to-be-detected video is the same as one of the types of videos included in the sample video library, thereby achieving the video The purpose of the classification.
  • the present invention derives a description vector according to a motion phrase for describing a time series relationship between motion atoms of successive complex motions, so that the description vector is reflected in the form of quantized data in a continuous complex motion at a time point A moving atom arranged in a time series relationship, and thereby detecting the degree of matching of the motion phrase with the video in the sample video library. Therefore, the process of classifying by using the description vector realizes the time factor including the video in the classification process, and also includes the motion atom for representing the specific action and content in the video, and combines the two to generate continuous complexity.
  • the motion phrase of the temporal relationship between the moving motion atoms, the motion phrases are filtered, and the motion phrases in the screening results have good representation, coverage and discriminability, reducing the number of motion phrases required to generate the description vector,
  • the resulting description vector is more streamlined, reduces the time to generate the description vector, and enables accurate classification of multiple different types of videos including long-term continuous complex motion.
  • the classification module 203 includes:
  • the third generating unit 2031 is configured to generate a response vector corresponding to the to-be-detected video.
  • the third obtaining unit 2032 is configured to acquire the description vector corresponding to each different type of video in the sample video library, and obtain a first classification rule according to the description vector.
  • the first classification rule is used to determine the type of the video to be detected.
  • a first classifying unit 2033 configured to determine, according to the first classification rule and the response vector, that the type of the to-be-detected video is the same as one of types of videos included in the sample video library, and Describe the detected video classification.
  • An apparatus for video classification is capable of segmenting a video in a sample video library, generating a motion atom, and generating a motion phrase of the video in the sample video library by using the segmentation result and the motion atom, for each
  • the motion phrase calculates the contribution of representative parameters and coverage parameters Value, first generate a motion phrase including a moving atom, select a motion phrase with good representativeness and coverage according to the contribution values of representative parameters and coverage parameters, obtain the first screening result, and then add a moving atom to the first 1 Filtering the motion phrases in the results, obtaining a new motion phrase, and then screening the obtained new motion phrases according to the contribution values of the representative parameters and the coverage parameters, obtaining the second screening result, and so on, repeating the process Until the nth screening result is obtained, the first to nth screening results corresponding to different types of videos in the sample library are combined to obtain a screening result set, and a description vector is generated according to the selected result set, and the first classification rule is generated by using the description vector
  • the present invention derives a description vector according to a motion phrase for describing a time series relationship between motion atoms of successive complex motions, so that the description vector is reflected in the form of quantized data in a continuous complex motion at a time point A moving atom arranged in a time series relationship, and thereby detecting the degree of matching of the motion phrase with the video in the sample video library.
  • the process of classifying by using the description vector realizes the time factor including the video in the classification process, and also includes the motion atom for representing the specific action and content in the video, and combines the two to generate continuous complexity.
  • the motion phrase of the temporal relationship between the moving motion atoms, the motion phrases are filtered, and the motion phrases in the screening results have good representation, coverage and discriminability, reducing the number of motion phrases required to generate the description vector,
  • the resulting description vector is more streamlined, reduces the time to generate the description vector, and enables accurate classification of multiple different types of videos including long-term continuous complex motion.
  • the classification module 203 includes:
  • the fourth generating unit 2034 is configured to generate a response vector corresponding to the to-be-detected video. Vector, get the second classification rule.
  • the second classification rule is used to detect whether the video to be detected is the same as the type of the video in the sample video library.
  • the detecting unit 2036 is configured to detect whether the response vector of the to-be-detected video meets the first Two classification rules.
  • the second classifying unit 2037 is configured to determine, when the content is consistent, that the video to be detected is the same as the type of the video in the sample video library.
  • An apparatus for video classification is capable of segmenting a video in a sample video library, generating a motion atom, and generating a motion phrase of the video in the sample video library by using the segmentation result and the motion atom, for each
  • the motion phrase calculates the contribution values of the representative parameters and the coverage parameters, first generates a motion phrase including a motion atom, and selects a motion phrase with good representativeness and coverage according to the contribution values of the representative parameters and the coverage parameters, In the first screening result, a motion atom is added to the motion phrase in the first screening result to obtain a new motion phrase, and then the new motion phrase is selected according to the contribution values of the representative parameter and the coverage parameter to obtain
  • the second screening result and so on, repeats the process until the nth screening result is obtained, according to the first to nth screening results, a description vector is generated, and the second classification rule is generated by using the description vector, and the response vector of the video to be detected is obtained.
  • the present invention derives a description vector according to a motion phrase for describing a time series relationship between motion atoms of successive complex motions, so that the description vector is reflected in the form of quantized data in a continuous complex motion at a time point A moving atom arranged in a time series relationship, and thereby detecting the degree of matching of the motion phrase with the video in the sample video library.
  • the process of classifying by using the description vector realizes the time factor including the video in the classification process, and also includes the motion atom for representing the specific action and content in the video, and combines the two to generate continuous complexity.
  • a motion phrase for the temporal relationship between moving motion atoms, screening motion phrases, and the motion phrases in the screening results have good representation, coverage, and discriminability, reducing the number of motion phrases required to generate the description vector,
  • the resulting description vector is more streamlined, reduces the time to generate the description vector, and enables accurate classification of videos that include long-term continuous complex motion.
  • the apparatus 200 further includes:
  • the obtaining module 204 is configured to acquire at least one of a response vector of the to-be-detected video And obtaining a primary motion phrase based on the at least one component.
  • the main motion phrase is a motion phrase corresponding to at least one component.
  • the display module 205 is configured to acquire and display a key frame of the to-be-detected video.
  • a device for video classification is capable of segmenting a video in a sample video library, generating a motion atom, and generating a motion phrase of the video in the sample video library by using the segmentation result and the motion atom, and The phrase is selected, and according to the selected result, a description vector is generated, and the description vector is used to determine the video to be detected that is the same as the video type in the sample video library, thereby achieving the purpose of video classification, and also according to the component in the response vector of the video to be detected. Get the main motion phrase to get and display the keyframe.
  • the present invention derives a description vector according to a motion phrase for describing a time series relationship between motion atoms of successive complex motions, so that the description vector is reflected in the form of quantized data in a continuous complex motion at a time point A nearby moving atom arranged in a time series relationship, and thereby detecting the degree of matching of the motion phrase with the video in the sample video library. Therefore, the process of classifying by using the description vector realizes the time factor including the video in the classification process, and also includes the motion atom for representing the specific action and content in the video, and combines the two to generate continuous complexity.
  • the motion phrase of the temporal relationship between the moving motion atoms, selecting the motion phrase, the motion phrase in the selected result has good representation, coverage and discriminability, reducing the number of motion phrases required to generate the description vector, so that The obtained description vector is more streamlined, and the time for generating the description vector is reduced, and the video including the continuous complex motion for a long time can be accurately classified; at the same time, the component in the response vector of the video to be detected can be obtained and displayed.
  • the key frame of the video to be detected clearly and briefly presents the main content of the video to be detected, so that the user can quickly understand the main content of the video.
  • the embodiment of the present invention further provides a video classification system 300, as shown in FIG. 12, comprising: at least one processor 301, such as a CPU, at least one communication bus 302, a memory 303, at least one network interface 304 or a user interface 305.
  • Communication bus 302 is used to implement connection communication between these components.
  • the user interface 305 includes a display, a keyboard, a mouse, and a touch screen.
  • the memory 303 may include a high speed RAM memory, and may also include a non-volatile memory, such as at least one disk memory.
  • the memory 303 can be used to store the segmentation result of the video in the sample video library and the sample video library, and can also be used to store the motion atom set, the description vector of the video in the sample video library, and the motion phrase set, and can also be used.
  • the result of storing the motion phrase, the type of the video in the sample video library, and the response vector of the video to be detected may also be used to store representative parameters of the motion phrase, coverage parameters, and contribution values of the coverage parameters, etc. It can be used to store the generated first classification rule and second classification rule.
  • the processor 301 may be configured to segment the video in the sample video library in chronological order and obtain a segmentation result, and generate a set of motion atoms; and, configured to utilize the motion atom set and the segmentation result. Generating a description vector corresponding to the video in the sample video library; and, for utilizing the description vector, determining a to-be-detected video of the same type as the video in the sample video library.
  • the sample video library includes at least one video, and the motion atoms in the motion atom set are generated according to the video in the sample video library.
  • the processor 301 is further configured to generate, according to the set of motion atoms and the segmentation result, a set of motion phrases corresponding to the video in the sample video library; and, for filtering the motion phrase, and Obtaining a screening result; and, for generating a description vector corresponding to the video in the sample video library according to the screening result.
  • the set of motion phrases includes at least two motion phrases, and one motion phrase includes motion atoms occurring in the vicinity of the time point in a certain order.
  • the processor 301 may be further configured to acquire a moving atomic unit ⁇ ( ⁇ , ⁇ ), and obtain a representative parameter Rep( , c) of a motion phrase according to the moving atomic unit; and, for acquiring the one a coverage parameter RepSet(C) of the motion phrase, and according to the coverage parameter RepSet(J,c) of the one motion phrase, the contribution value AREPSet( ,c) of the motion phrase to the coverage parameter is obtained, RepSet(n, c) and, for the motion
  • Each of the motion phrases in the set of phrases performs the above process and obtains representative parameters and contribution values for each of the motion phrases in the set of motion phrases.
  • is the standard deviation of the Gaussian distribution
  • V is the video in the sample video library
  • r( , ) is the response of the video in the video library of a motion phrase sample
  • r(F, ?) min ⁇ ) ⁇ ( ⁇ , ⁇ )
  • v(V, ⁇ ) max Score( (V, t') ,A)-N(t' ⁇ t, ⁇ ) ,
  • ARepSet( ,c) RepSet(r,c)—RepSet(r— ⁇ , , 7; is the number of segments obtained by segmenting the video identified as c in the sample video library, is a set of motion phrases, and a motion
  • the identifier of the video type to which the motion atom contained in the phrase belongs is C.
  • the sample video library includes at least two videos, and the types of videos in the sample video library are the same.
  • a motion phrase in a set of motion phrases includes a motion primitive in a set of motion atoms.
  • the processor 301 is further configured to: according to the representative parameter and the contribution value of each motion phrase in the set of motion phrases, according to the value of Rep( , c) + ARepSet( ⁇ , c) from large to d, The order of the motion phrases in the set of motion phrases, and the previous motion phrase as a first screening result; and, for extracting a motion atom from the set of motion atoms, into the first screening result a motion phrase such that the motion phrase in the first screening result has 2 motion atoms; and, for repeating the above process until the n-1th screening result is obtained, and extracting a motion atom from the motion atom set Adding to the n-1th screening result a motion phrase, wherein the motion phrase in the n-1th screening result has n motion atoms, and an nth screening result is obtained according to the motion phrase in the n-1th screening result, where the nth screening result is Rep( himself , c) + ARepSet( profession , c)
  • n is a positive integer greater than or equal to 1.
  • the processor 301 is further configured to: obtain, according to the screening result of the motion phrase corresponding to different types of videos in the sample video library, a screening result set; and, to generate, according to the selected result set, the generated The description vector corresponding to the video in the sample video library.
  • the sample video library includes at least two videos, and the sample video library includes at least two types of videos.
  • the processor 301 is further configured to generate a response vector corresponding to the to-be-detected video, and configured to acquire the description vector corresponding to each different type of video in the sample video library, and according to the description vector Obtaining a first classification rule; and, configured to determine, according to the first classification rule and the response vector, that the type of the to-be-detected video is the same as one of types of videos included in the sample video library, And classifying the video to be detected.
  • the first classification rule is used to determine the type of the video to be detected.
  • the processor 301 is further configured to generate a response vector corresponding to the to-be-detected video; a class rule; and, configured to detect whether the response vector of the to-be-detected video meets the second classification rule; When it is met, it is determined that the video to be detected is the same type as the video in the sample video library.
  • the second classification rule is used to detect whether the video to be detected is the same as the type of the video in the sample video library.
  • the processor 301 is further configured to acquire at least one component of the response vector of the to-be-detected video, and obtain a main motion phrase according to the at least one component; and, for acquiring and displaying the to-be-detected video.
  • Keyframe wherein, the main motion phrase is a motion phrase corresponding to at least one component. The response of each key atomic unit in the key frame and the main motion phrase is the largest.
  • a video classification system is capable of segmenting a video in a sample video library, generating a motion atom, and generating a motion phrase of the video in the sample video library by using the segmentation result and the motion atom, and the motion phrase Selecting, according to the selected result, generating a description vector, using the description vector to determine the same video type as the video in the sample video library, thereby achieving the purpose of video classification, and also obtaining the component according to the response vector of the video to be detected.
  • the main motion phrase thus getting and displaying keyframes.
  • the present invention derives a description vector according to a motion phrase for describing a time series relationship between motion atoms of successive complex motions, so that the description vector is reflected in the form of quantized data in a continuous complex motion at a time point A nearby moving atom arranged in a time series relationship, and thereby detecting the degree of matching of the motion phrase with the video in the sample video library. Therefore, the process of classifying by using the description vector realizes the time factor including the video in the classification process, and also includes the motion atom for representing the specific action and content in the video, and combines the two to generate continuous complexity.
  • the motion phrase of the temporal relationship between the moving motion atoms, selecting the motion phrase, the motion phrase in the selected result has good representation, coverage and discriminability, reducing the number of motion phrases required to generate the description vector, so that The obtained description vector is more streamlined, and the time for generating the description vector is reduced, and the video including the continuous complex motion for a long time can be accurately classified; at the same time, the component in the response vector of the video to be detected can be obtained and displayed.
  • the key frame of the video to be detected clearly and briefly presents the main content of the video to be detected, so that the user can quickly understand the main content of the video. It is sufficient to refer to the same parts as each other, and each embodiment focuses on the differences from the other embodiments.
  • the description since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

Abstract

一种视频分类的方法和装置,涉及电子信息技术领域,能够提高视频分类的精确度。该方法包括:按照时间顺序对样本视频库中的视频进行分段并得到分段结果,并生成运动原子集合;利用所述运动原子集合和所述分段结果,生成能够表达复杂运动模式的运动短语集合,并生成基于所述运动短语集合的所述样本视频库中的视频的描述向量;利用所述描述向量,确定与所述样本视频库中的视频的类型相同的待检测视频。该方法适用于视频分类的场景中。

Description

视频分类的方法和装置 本申请要求于 2 01 3 年 1 1 月 29 日提交中国专利局、 申请号为 2 01 31 06 3 1 9 01. 6 , 发明名称为 "视频分类的方法和装置" 的中国专 利申请的优先权, 其全部内容通过引用结合在本申请中。
技术领域
本发明涉及电子信息技术领域, 尤其涉及一种视频分类的方法和装 置。
背景技术
随着视频数据的大量增加, 用户逐个浏览视频, 根据视频中人物的运 动信息对视频进行分类需要耗费大量的时间和精力。虽然现在已经可以针 对视频中一些如行走、 跑步等简单的运动对视频进行分类, 但是视频中的 运动往往比较复杂, 如体育活动等, 针对简单运动对视频分类已经不能满 足用户的需要了。为了能够针对这些视频中较为复杂且连续的运动对视频 进行分类, 现有技术从视频中提取一些局部区域的特征, 如 HOG ( Histogram of Oriented Gradients , 方向梯度直方图) 等特征, 根据这些 特征进行类聚,形成运动原子,运动原子是具有某些共性的简单运动模式, 之后计算待检测视频与这些运动原子的响应, 将得到的响应组成向量, 再 根据得到的向量对待检测视频进行分类。
但是, 视频中总是会出现具有较强时序关系的复杂运动, 通过由运动 原子得到的向量对待检测视频进行分类, 难以保证分类的精确度。 因此, 在现有技术采用了另一种方法,将视频中的较复杂运动按时间分解成一些 由简单运动组成的片段, 每一个片段都对应一个时间点, 在进行分类时, 按照时间顺序将每一个片段与样本分解出的片段进行比对,得到每一个片 段各自的比对得分, 将这些比对得分进行加权求和得到最终的比对得分, 根据最终的比对得分对视频进行分类。 但是, 对于较连续且持续时间较长的复杂运动, 现有技术很难将这样 的复杂运动恰当的分解成由简单运动组成的片段,而且当视频中复杂运动 的分解时间点设置不同时,与样本分解出的片段进行比对后得到的比对得 分也不同, 从而导致视频分类产生多个不同的结果, 难以统一, 视频分类 的精确度也比较低。
发明内容
本发明的实施例提供一种视频分类的方法和装置,能够提高视频分类 的精确度。
为达到上述目的, 本发明的实施例采用如下技术方案:
第一方面, 本发明实施例提供一种视频分类的方法, 包括: 成运动原子集合, 所述样本视频库包括至少一个视频, 所述运动原子集合 利用所述运动原子集合和所述分段结果,生成对应于所述样本视频库 中的视频的描述向量;
利用所述描述向量,确定与所述样本视频库中的视频的类型相同的待 检测视频。
结合第一方面, 在第一种可能的实现方式中, 所述利用所述运动原子 集合和所述分段结果, 生成对应于所述样本视频库中的视频的描述向量, 包括:
根据所述运动原子集合和所述分段结果,生成对应于所述样本视频库 中的视频的运动短语集合, 所述运动短语集合包括至少二个运动短语, 一 个运动短语包括了按照一定的先后顺序在时间点附近发生的运动原子; 筛选所述运动短语, 并得到 选结果;
根据所述 选结果, 结合第一方面和第一方面的第一种可能的实现方式,在第二种可能的 实现方式中, 所述样本视频库包括至少二个视频, 并且所述样本视频库中 的视频的类型相同。
结合第一方面的第二种可能的实现方式, 在第三种可能的实现方式 中, 还包括:
获取运动原子单元 π(Α,ζσ) , 并根据所述运动原子单元获取一个运动
∑ r{V,Px)
短语的代表性参数 Rep( ,c), Rep( 1,c)= '^c^ 、, , A为运动原子, t为所 述样本视频库中视频中的时间点, (7为高斯分布的标准差, V为所述样本 视频库中的视频, ?为所述一个运动短语, r( , )为所述一个运动短语 Pi 对所述样本视频库 中 的视频 的 响应 , r(F,^)=min π)Άχν(ν,π) , v(V, π) = max Score( (V, t') ,A)-N{t'\ t, σ) , ORt指计算所述样本视频库中 的视频与时间相邻的所述运动原子单元的响应, S( ,c)表示和所述一个运 动短语响应最大的所述样本视频库中的视频的集合, c为所述样本视频库 中的视频的类型的标识, Φ( )为所述样本视频库中视频中以 开始的所 述分段结果的视频特征, &。 Γβ(Φ(
Figure imgf000004_0001
A)是将 Φ( , t')输入到支持向量机
SVM分类器得到的得分, N(i'|i, )是指以 t为均值, σ为标准差的高斯分布,
Ω(0指以 t为中心的一个邻域;
获取所述一个运动短语的覆盖性参数 RepSet(C),并根据所述一个运 动短语的覆盖性参数 RepSet(r ,c),得到所述一个运动短语对所述覆盖性参 数的贡献值 ARepSet( ,c) , ,
Figure imgf000004_0002
ARepSet(^ , c) - RepSet (r^ , c) - RepSet (r^ - { ^ } , c) , 7;为所述样本视频库中标识 为 c的视频分段得到的片段的数量, 为所述运动短语集合, 且所述一个 运动短语包含的所述运动原子所属视频类型的标识为 c; 针对所述运动短语集合中的每一个运动短语, 执行上述过程, 并得到 所述运动短语集合中的每一个运动短语的代表性参数和贡献值;
所述 选所述运动短语, 得到 选结果, 包括:
根据所述运动短语集合中的每一个运动短语的代表性参数和贡献值, 按照 Rep( ,c) + ARepSet( ,c)的值由大到小的顺序对所述运动短语集合中的 运动短语进行排序, 并将前 个运动短语作为第 1 选结果, !^为大于等 于 1的正整数;
从所述运动原子集合中提取一个运动原子加入所述第 1筛选结果中的 运动短语, 使得所述第 1筛选结果中的运动短语具有 2个运动原子;
重复上述过程, 直至得到第 n- 1筛选结果, 再从所述运动原子集合中 提取一个运动原子加入所述第 n- 1 选结果中的运动短语, 使得所述第 n- 1 筛选结果中的运动短语具有 n个运动原子, 再根据所述第 n- 1筛选结果中的 运 动 短语得 到 第 n 筛 选 结 果 , 所 述 第 n 筛 选 结 果 为 按 照 Rep(„, c) + ARepSet(„, c)的值由大到 d、的顺序排列的前 mn个运动短语, mn为 大于等于 1的正整数, 第 n筛选结果中的运动短语具有 n个运动原子, n为大 于等于 1的正整数;
根据所述第 1至第 n筛选结果, 生成所述描述向量。
结合第一方面的第三种可能的实现方式, 在第四种可能的实现方式 中, 所述样本视频库包括至少二个视频, 并且所述样本视频库包括至少二 种类型的视频; 所述根据 选结果, 生成与所述样本视频库中的视频对应 的描述向量, 包括:
根据所述样本视频库中不同类型的视频对应的所述运动短语的筛选 结果, 得到筛选结果集合;
根据所述 选结果集合,生成所述样本视频库中的视频对应的描述向 量。
结合第一方面的第四种可能的实现方式, 在第五种可能的实现方式 中, 所述利用所述描述向量, 确定与所述样本视频库中的视频的类型相同 的待检测视频, 包括:
生成所述待检测视频对应的响应向量;
获取所述样本视频库中各个不同类型的视频对应的所述描述向量, 并 根据所述描述向量, 得到第一分类规则, 所述第一分类规则用于确定所述 待检测视频的所属类型;
根据所述第一分类规则和所述响应向量,确定所述待检测视频的类型 与所述样本视频库包括的视频的类型中的一种类型相同,并将所述待检测 视频分类。
结合第一方面和第一方面的第二种可能的实现方式,在第六种可能的 实现方式中, 所述利用所述描述向量, 确定与所述样本视频库中的视频的 类型相同的待检测视频, 包括:
生成所述待检测视频对应的响应向量。; 则,所述第二分类规则用于检测所述待检测视频是否与所述样本视频库中 的视频的类型相同;
检测所述待检测视频的响应向量是否符合所述第二分类规则; 同。
结合第一方面, 在第七种可能的实现方式中, 还包括:
获取所述待检测视频的响应向量中的至少一个分量, 并根据所述至少 一个分量得到主要运动短语,所述主要运动短语为与所述至少一个分量对 应的运动短语;
获取并显示所述待检测视频的关键帧,所述关键帧与所述主要运动短 语中的每个运动原子单元的响应最大。
第二方面, 本发明实施例提供一种视频分类的装置, 包括: 得到分段结果,并生成运动原子集合,所述样本视频库包括至少一个视频, 第二生成模块, 用于利用所述运动原子集合和所述分段结果, 生成对 应于所述样本视频库中的视频的描述向量;
分类模块, 用于利用所述描述向量, 确定与所述样本视频库中的视频 的类型相同的待检测视频。
结合第二方面, 在第一种可能的实现方式中, 所述第二生成模块, 包 括:
第一生成单元, 用于根据所述运动原子集合和所述分段结果, 生成对 应于所述样本视频库中的视频的运动短语集合,所述运动短语集合包括至 少二个运动短语,一个运动短语包括了按照一定的先后顺序在时间点附近 发生的运动原子;
筛选单元, 用于 选所述运动短语, 并得到 选结果;
第二生成单元, 用于根据所述 选结果, 生成与所述样本视频库中的 视频对应的描述向量。
结合第二方面和第二方面的第一种可能的实现方式,在第二种可能的 实现方式中, 所述样本视频库包括至少二个视频, 并且所述样本视频库中 的视频的类型相同。
结合第二方面的第二种可能的实现方式, 在第三种可能的实现方式 中,所述运动短语集合中的运动短语包括一个所述运动原子集合中的运动 原子; 所述第二生成模块, 还包括:
第一获取单元, 用于获取运动原子单元 π(Α,ζσ) , 并根据所述运动原
∑ ν, ρ) 子单元获取一个运动短语的代表性参数 Rep( , c) , Rep( , c) = i≡S(^ ,
A为运动原子, t为所述样本视频库中视频中的时间点, σ为高斯分布的标 准差, V为所述样本视频库中的视频, ?1为所述一个运动短语, r( , )为 所述一个运动短语 Ρι对所述样本视频库中的视频的响应, r(F, ?) = min π)Άχν(ν, π) , v(V, π) = max Score( (V, t') , A) - N(t' \ t, σ) , 指 S , C)表示和所述一个运动短语响应最大的所述样本视频库中的视频的 集合, c为所述样本视频库中的视频的类型的标识, 为所述样本视 频库中视频中以 t'开始的所述分段结果的视频特征, &。 Γβ(Φ( ,
Figure imgf000008_0001
A)是将 (ν, t')输入到支持向量机 S VM分类器得到的得分, N{f I σ)是指以 t为均 值, σ为标准差的高斯分布, Ω (0指以 t为中心的一个邻域;
第二获取单元, 用 于获取所述一个运动短语的覆盖性参数 RepSet( ,c) , 并根据所述一个运动短语的覆盖性参数 RepSet(r ,c) , 得到所 述一个运动短语对所述覆盖性参数的贡献值 ARepSet( , c) ,
RepSet(r^C) = ^|U^rS(^, c)
ARepSet(^ , c) - RepSet (r^ , c) - RepSet (r^ - { ^ } , c) , 7;为所述样本视频库中标识 为 c的视频分段得到的片段的数量, 为所述运动短语集合, 且所述一个 运动短语包含的所述运动原子所属视频类型的标识为 c;
针对所述运动短语集合中的每一个运动短语, 执行上述过程, 并得到 所述运动短语集合中的每一个运动短语的代表性参数和贡献值;
所述 选单元, 包括:
筛选子单元,用于根据所述运动短语集合中的每一个运动短语的代表 性参数和贡献值, 按照 Rep( , c) + ARepSet(^, c)的值由大到 d、的顺序对所述 运动短语集合中的运动短语进行排序, 并将前 个运动短语作为第 1筛选 结果, 为大于等于 1的正整数;
添加子单元,用于从所述运动原子集合中提取一个运动原子加入所述 第 1筛选结果中的运动短语, 使得所述第 1 选结果中的运动短语具有 2个 运动原子;
连续运行所述 选子单元和所述添加子单元, 直至得到第 n- 1 选结 果, 再从所述运动原子集合中提取一个运动原子加入所述第 n- 1 选结果 中的运动短语, 使得所述第 n- 1筛选结果中的运动短语具有 n个运动原子, 再根据所述第 n- 1筛选结果中的运动短语得到第 n筛选结果, 所述第 n筛选 结果为按照 Rep(„, c) + ARepSet (尸", c)的值由大到 'J、的顺序排列的前 mn个运 动短语, mn为大于等于 1的正整数, 第 n筛选结果中的运动短语具有 n个运 动原子, n为大于等于 1的正整数;
第一生成子单元, 用于根据所述第 1至第 n筛选结果, 生成所述描述向 量。
结合第二方面的第三种可能的实现方式, 在第四种可能的实现方式 中, 所述样本视频库包括至少二个视频, 并且所述样本视频库包括至少二 种类型的视频;
所述第二生成单元, 包括:
集合子单元,用于根据所述样本视频库中不同类型的视频对应的所述 运动短语的筛选结果, 得到筛选结果集合;
第二生成子单元, 用于根据所述 选结果集合, 生成所述样本视频库 中的视频对应的描述向量。
结合第二方面的第四种可能的实现方式, 在第五种可能的实现方式 中, 所述分类模块, 包括:
第三生成单元, 用于生成所述待检测视频对应的响应向量; 第三获取单元,用于获取所述样本视频库中各个不同类型的视频对应 的所述描述向量, 并根据所述描述向量, 得到第一分类规则, 所述第一分 类规则用于确定所述待检测视频的所属类型;
第一分类单元, 用于根据所述第一分类规则和所述响应向量, 确定所 述待检测视频的类型与所述样本视频库包括的视频的类型中的一种类型 相同, 并将所述待检测视频分类。
结合第二方面和第二方面的第二种可能的实现方式,在第六种可能的 实现方式中, 所述分类模块, 包括:
第四生成单元, 用于生成所述待检测视频对应的响应向量; 量, 得到第二分类规则, 所述第二分类规则用于检测所述待检测视频是否 与所述样本视频库中的视频的类型相同;
检测单元,用于检测所述待检测视频的响应向量是否符合所述第二分 类规则;
第二分类单元, 用于当符合时, 确定所述待检测视频与所述样本视频 库中的视频的类型相同。
结合第二方面, 在第七种可能的实现方式中, 还包括:
获取模块, 用于获取所述待检测视频的响应向量中的至少一个分量, 并根据所述至少一个分量得到主要运动短语,所述主要运动短语为与所述 至少一个分量对应的运动短语;
显示模块, 用于获取并显示所述待检测视频的关键帧, 所述关键帧与 所述主要运动短语中的每个运动原子单元的响应最大。
本发明实施例提供的一种视频分类的方法和装置,能够将样本视频库 中的视频分段, 生成运动原子, 并利用分段结果和运动原子生成样本视频 库中的视频的描述向量, 利用描述向量, 确定与样本视频库中视频类型相 同的待检测视频, 从而达到视频分类的目的。 而现有技术中根据运动原子 得到待检测视频对应的向量的方案如图 la所示, 由于运动原子不含有时 间因素, 无法体现连续复杂运动的运动原子之间的时序关系。 而本发明根 据运动原子, 生成了运动短语, 又根据运动短语生成了描述向量, 运动短 语包括了按照一定的先后顺序在时间点附近发生的运动原子,用于描述连 续复杂运动的运动原子之间的时序关系, 例如: 采用 SVM分类器将待检 测视频分类, 本发明的方案如图 lb所示。 现有技术中将视频按时间分解 成简单片段的方案, 由于分解片段的时间设置点选择不同,会导致视频分 类结果也不同,因此难以恰当的将连续复杂运动分解成简单运动组成的片 段, 从而导致分类结果不精确。 与现有技术相比, 本发明根据用于描述连 续复杂运动的运动原子之间的时序关系的运动短语得到描述向量,使得描 述向量以量化数据的形式反映出在连续复杂运动中,在时间点附近按照时 序关系排列的运动原子,并以此检测运动短语与样本视频库中视频匹配程 度的高低。 因此利用描述向量进行分类的过程, 实现了在分类过程中既包 括了视频的时间因素, 也包括了用于表示视频中具体动作、 内容的运动原 子,并且结合二者生成了用于描述连续复杂运动的运动原子之间的时序关 系的运动短语, 以及根据运动短语生成的描述向量,从而能够对包括长时 间的连续复杂运动的视频进行准确分类。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例中所 需要使用的附图作简单地介绍, 显而易见地, 下面描述中的附图仅仅是本 发明的一些实施例, 对于本领域普通技术人员来讲, 在不付出创造性劳动 的前提下, 还可以根据这些附图获得其它的附图。
图 l a为现有技术中的一种视频分类的方法的举例流程图;
图 lb为本发明提供的一种视频分类的方法的举例流程图;
图 l c为本发明实施例提供的一种视频分类的方法的流程图; 图 2为本发明实施例提供的一种视频分类的方法的一种具体实现方式 的流程图;
图 3a为本发明实施例提供的一种视频分类的方法的另一种具体实现 方式的流程图;
图 3b为本发明实施例提供的一种视频分类的方法的又一种具体实现 方式的流程图;
图 3c为本发明实施例提供的或操作和与操作的举例说明示意图; 图 4a为本发明实施例提供的一种视频分类的方法的再一种具体实现 方式的流程图;
图 4b为本发明实施例提供的显示视频中的主要信息的举例说明示意 图;
图 5本发明实施例提供的一种视频分类的装置的结构示意图; 图 6为本发明实施例提供的一种视频分类的装置的一种具体实现方式 的结构示意图;
图 7为本发明实施例提供的另一种视频分类的装置的结构示意图; 图 8为本发明实施例提供的另一种视频分类的装置的一种具体实现方 式的结构示意图;
图 9为本发明实施例提供的另一种视频分类的装置的另一种具体实现 方式的结构示意图;
图 10为本发明实施例提供的另一种视频分类的装置的又一种具体实 现方式的结构示意图;
图 11为本发明实施例提供的又一种视频分类的装置的结构示意图; 图 12为本发明实施例提供的一种视频分类系统的结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进 行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例, 而不是全部的实施例。 基于本发明中的实施例, 本领域普通技术人员在没 有做出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的 范围。
本发明实施例提供的技术方案能够根据待检测视频中的运动信息,生 成运动原子集合, 最终得到样本视频库中的视频的描述向量, 利用描述向 量对待检测视频进行分类。 在实际应用中, 本方案可以将待检测视频粗略 分为大类, 如音乐视频、 体育视频或舞蹈视频等; 视频也可以将待检测视 频详细分为小类, 如短跑视频、 跳高视频或跳远视频等。
本发明实施例提供了一种视频分类的方法, 如图 l c所示, 包括: 并生成运动原子集合。
其中, 样本视频库中的视频可以根据用户分类的需求来选择, 比如: 用户想要将待检测视频分为舞蹈视频、 话剧视频、 体育视频三大类型, 那 么可以选择舞蹈视频、话剧视频、 体育视频这三大类型的视频放入样本视 频库, 作为样本视频库中的视频。 再比如: 用户想要将待检测的体育类视 频分为跳高视频、 游泳视频、 体操视频三个较小的类型, 那么可以选择跳 高视频、 游泳视频、 体操视频三个较小的类型的视频放入样本视频库, 作 为样本视频库中的视频。
其中, 样本视频库包括至少一个视频, 运动原子集合中的运动原子是 根据样本视频库中的视频生成的。系统将样本视频库中的每个视频分成长 度相等的视频片段, 相邻视频片段之间有一定的时间重叠, 比如: 视频片 段 1是视频中 00:00:00-00:01 :00的片段,与视频片段 1相邻的视频片段 2是视 频中 00: 00: 30-00: 01: 30的片段。系统对每一个视频片段提取低层视频特征, 低层视频特征可以是 HOG ( Histogram of Oriented Gradients , 方向梯度直 方图)特征、 稠密轨迹特征等, 得到低层视频特征的集合, 低层视频特征 的集合可以表示为 = ft }il , N为样本视频库中的视频数, k为每个视频 分解成的视频片段数, k 是一个 d维的向量, d由具体的低层视频特征决 定, 根据公式 Sim(Jii , ) = exp 得到低层视频特征的相似
Figure imgf000013_0001
度参数 d Z^) , 为所有向量两两之间欧式距离的均值, 表示 的 第 K维分量。 系统再根据低层视频特征的相似度参数, 利用聚类算法形成 运动原子, 聚类算法可以是近邻传播算法等。 由运动原子得到运动原子集 合。
102 , 利用所述运动原子集合和所述分段结果, 生成对应于所述样本 视频库中的视频的描述向量。
其中, 运动原子集合中的运动原子按照一定时间顺序发生, 可以形成 运动短语, 利用运动短语与样本视频库中的视频计算响应, 将得到的响应 的值组成样本视频库中的视频的描述向量,从而达到将视频的内容量化的 目的。
103 , 利用所述描述向量, 确定与所述样本视频库中的视频的类型相 同的待检测视频。 其中, 利用样本视频库中的视频的描述向量, 可以形成视频分类的规 则, 通过确定待检测视频与样本视频库中哪一种视频的类型相同, 将待检 测视频分类。
本发明实施例提供的一种视频分类的方法,能够将样本视频库中的视 频分段, 生成运动原子, 并利用分段结果和运动原子生成样本视频库中的 视频的描述向量, 利用描述向量, 确定与样本视频库中视频类型相同的待 检测视频, 从而达到视频分类的目的。 与现有技术相比, 本发明根据用于 描述连续复杂运动的运动原子之间的时序关系的运动短语得到描述向量, 使得描述向量以量化数据的形式反映出在连续复杂运动中,在时间点附近 按照时序关系排列的运动原子,并以此检测运动短语与样本视频库中视频 匹配程度的高低。 因此利用描述向量进行分类的过程, 实现了在分类过程 中既包括了视频的时间因素, 也包括了用于表示视频中具体动作、 内容的 运动原子,并且结合二者生成了用于描述连续复杂运动的运动原子之间的 时序关系的运动短语, 以及根据运动短语生成的描述向量, 从而能够对包 括长时间的连续复杂运动的视频进行准确分类。
可选的, 在图 1所示的方案的基础上, 本发明实施例还提供了一种视 频分类的方法的具体方案, 对图 1中的 102的执行过程进一步细化, 其中, 102可以具体实现为 1021 - 1023 , 如图 2所示, 包括:
1021 , 根据所述运动原子集合和所述分段结果, 生成对应于所述样本 视频库中的视频的运动短语集合。
其中, 运动短语集合包括至少二个运动短语, 一个运动短语包括了按 照一定的先后顺序在时间点附近发生的运动原子,运动短语可以表示运动 原子之间的时序关系。
1022 , 筛选所述运动短语, 并得到筛选结果。
其中, 如果在运动原子集合中共有 M个运动原子, 样本视频库中的每 个视频被分解为 k个片段, 那么可能生成 2Mxk个运动短语, 大量的运动短 语使得分类过程中的计算量增大, 此时通过 选出具有良好的代表性、 覆 盖性和判别性的运动短语作为 选结果, 进行之后的流程。
1023 , 根据所述 选结果, 生成与所述样本视频库中的视频对应的描 述向量。
其中, 利用筛选结果中的运动短语与样本视频库中的视频计算响应, 将得到的响应的值组成样本视频库中的视频的描述向量,从而达到将视频 的内容量化的目的。
本发明实施例提供的一种视频分类的方法,能够将样本视频库中的视 频分段, 生成运动原子, 并利用分段结果和运动原子生成样本视频库中的 视频的运动短语,并对运动短语进行 选,根据 选结果,生成描述向量, 利用描述向量, 确定与样本视频库中视频类型相同的待检测视频, 从而达 到视频分类的目的。 与现有技术相比, 本发明根据用于描述连续复杂运动 的运动原子之间的时序关系的运动短语得到描述向量,使得描述向量以量 化数据的形式反映出在连续复杂运动中,在时间, 附近按照时序关系排列 的运动原子, 并以此检测运动短语与样本视频库中视频匹配程度的高低。 因此利用描述向量进行分类的过程,实现了在分类过程中既包括了视频的 时间因素, 也包括了用于表示视频中具体动作、 内容的运动原子, 并且结 合二者生成了用于描述连续复杂运动的运动原子之间的时序关系的运动 短语, 对运动短语进行 选, 选结果中的运动短语具有良好的代表性、 覆盖性和判别性, 减少了生成描述向量需要的运动短语的数量, 使得得到 的描述向量更加精简, 并减少了生成描述向量的时间, 并且能够对包括长 时间的连续复杂运动的视频进行准确分类。
可选的, 在图 2所示的方案的基础上, 本发明实施例还提供了一种视 频分类的方法的具体方案, 在图 2中的 1022细化的执行过程中增加了 1024- 1025 , 并对图 2中的 1022和 103的执行过程进一步细化, 其中, 1022 可以具体实现为 10221 - 10224 , 103可以具体实现为 103 l a- 1034a , 如图 3a 所示, 包括:
1024 , 获取运动原子单元 π(Α,ζσ) , 并根据所述运动原子单元获取一 个运动短语的代表性参数 Rep( , c) 其中, Rep( ,c) , A为运动原子, t为样本视频库中视频
Figure imgf000016_0001
中的时间点, σ为高斯分布的标准差, V为样本视频库中的视频, 为一 个运动短语, 而且这一个运动短语 Pi包括运动原子集合中的 1个运动原子; r( , )为一个运动短语 样本视频库中的视频的响应,
Figure imgf000016_0002
max v(7, r)表示运动短语中的或操作, 或操作指计算样本视频库中的同种 类型的视频与时间位于邻近区域的运动短语中的运动原子单元的响应,并 选取时间位于邻近区域的响应最大的运动原子单元的响应值; min maxv(7, r)表示运动短语中的与操作,与操作指在或操作中选取的响应 最大的运动原子单元的响应中取最小值, 当这个最小值大于预设的阔值 时, 表示运动短语与样本视频库中的视频匹配度高, 例如: 如图 3c所示, OR为或操作, AND为与操作, 运动原子单元 1与运动原子单元 2的时间位 于相邻区域, 运动原子单元 3与运动原子单元 4的时间位于相邻区域, 对运 动原子单元 1与运动原子单元 2进行或操作, 运动原子单元 1的响应大于运 动原子单元 2的响应, 选取运动原子单元 1的响应值, 同时对运动原子单元 3与运动原子单元 3进行或操作,运动原子单元 4的响应大于运动原子单元 3 的响应, 选取运动原子单元 4的响应值, 再对比运动原子单元 1的响应与运 动原子单元 4的响应, 选取运动原子单元 1的响应与运动原子单元 4的响应 中最小的响应值; S( ,c)表示和一个运动短语响应最大的样本视频库中的 视频的集合, c为样本视频库中的视频的类型的标识, 为样本视频 库中视频中以 开始的分段结果的视频特征, &。 Γβ(Φ( ,
Figure imgf000016_0003
^是将 Φ( , t')输 入到支持向量机 SVM分类器得到的得分, N(Z' , )是指以 t为均值, σ为标 准差的高斯分布, Ω (0指以 t为中心的一个邻域。
其中, 代表性参数要求运动短语对某一类型的视频有尽可能大的反 应, 表示该运动短语对于这一类型的视频具有代表性。
进一步的, 运动短语 P对某一种类型的视频的判别性参数 Disil, c)表 示运动短语对某一种类型的视频的代表性与其他类型的视频的差异,判别 性参数越大, 表示运动短语的判别性能越好,
Dis(Px , c) = Rep(i^ , c) - max Rep(i^ , ς. ) , C表示样本视频库中的所有的视频的类 型。
1025 , 获取所述一个运动短语的覆盖性参数 RepSet( ,c) , 并根据所述 一个运动短语的覆盖性参数 RepSet(C) ,得到所述一个运动短语对所述覆 盖性参数的贡献值 ARepSet( , c)。 其中, RepSet(r ,c) = |U
Figure imgf000017_0001
△RepSet( , c) = RepSet (r^, c) - RepSet (τ^ - {^ } , c) , 为样本视频库中标识为 c 的视频分段得到的片段的数量, 为运动短语集合, 且一个运动短语包 含的运动原子所属视频类型的标识为 c。
其中,覆盖性要求筛选出的运动短语生成的运动短语集合能够尽量覆 盖各个类型的视频。
其中, 针对所述运动短语集合中的每一个运动短语执行 104- 105 , 并 得到所述运动短语集合中的每一个运动短语的代表性参数和贡献值。
1022 1 , 根据所述运动短语集合中的每一个运动短语的代表性参数和 贡献值, 按照 Rep( ,c) + ARepSet( ,c)的值由大到小的顺序对所述运动短语 集合中的运动短语进行排序, 并将前 个运动短语作为第 1 选结果。
其中, 为大于等于 1的正整数, 可以是系统根据样本视频库中的 视频的类型和数量设定的值, 也可以由用户设定并输入。
10222 ,从所述运动原子集合中提取一个运动原子加入所述第 1筛选结 果中的运动短语, 使得所述第 1筛选结果中的运动短语具有 2个运动原子。
其中, 系统可以从运动原子集合中提取的运动原子加入第 1筛选结果 中的运动短语, 运用遍历的方法生成具有 2个运动原子的新的运动短语, 在生成的新的运动短语中的这 2个运动原子不在同一时间点发生。
10223 , 重复上述过程, 直至得到第 n- 1筛选结果, 再从所述运动原子 集合中提取一个运动原子加入所述第 n- 1筛选结果中的运动短语, 使得所 述第 n- 1筛选结果中的运动短语具有 n个运动原子, 再根据所述第 n- 1筛选 结果中的运动短语得到第 n筛选结果。
其中, 比如: 运动短语集合中的运动短语包括运动原子集合中的 1个 运动原子, 通过 10221 , 得到第 1筛选结果, 再通过 10222 , 得到具有 2个运 动原子的新的运动短语,再通过 10221的过程对具有 2个运动原子的新的运 动短语进行 选, 得到第 2 选结果, 再通过 10222的过程得到具有 3个运 动原子的新的运动短语, 以此类推, 直至得到第 n筛选结果。
其中,第 n筛选结果为按照 Rep( , c) + ARepSet( , c)的值由大到 d、的顺序 排列的前1¾个运动短语, mn为大于等于 1的正整数, 第 n筛选结果中的运 动短语具有 n个运动原子, n为大于等于 1的正整数, n可以是系统根据样本 视频库中的视频的类型和数量设定的值, 也可以由用户设定并输入。
10224 , 根据所述第 1至第 n筛选结果, 生成所述描述向量。
其中, 第 1筛选结果中的运动短语包括运动原子集合中的 1个运动原 子, 第 2筛选结果中的运动短语包括运动原子集合中的 2个运动原子, 以此 类推, 第 n筛选结果中的运动短语包括运动原子集合中的 n个运动原子。
其中, 根据第 1至第 n筛选结果中的运动短语, 生成筛选出的运动短语 的集合, 将筛选出的运动短语的集合作为基底, 得到样本视频库中的视频 的描述向量, 样本视频库中的每一个视频都有对应的描述向量, 描述向量 中的每一个分量都是第 1至第 n 选结果中的运动短语对样本视频库中的 视频的响应。
1031 a, 生成所述待检测视频对应的响应向量。
其中, 将 10224中得到的筛选出的运动短语的集合作为基底, 生成待 检测视频对应的响应向量,响应向量中的分量是第 1至第 n筛选结果中的运 动短语对待检测视频的响应。
1032a, 根据所述样本视频库中各个视频对应的描述向量, 得到第二 分类规则。
其中, 样本视频库包括至少二个视频, 并且样本视频库中的视频的类 型相同。根据描述向量,可以生成第二分类规则,比如:使用 SVM ( Support Vector Machine , 支持向量机) 分类器进行分类, 将得到的样本视频库中 的视频的描述向量加入 SVM分类器, SVM分类器会生成分类规则, 分类 规则可以是第二分类规则,第二分类规则用于检测待检测视频是否与样本 视频库中的视频的类型相同。
1033a,检测所述待检测视频的响应向量是否符合所述第二分类规则。 其中,利用 1032a中生成的第二分类规则检测待检测视频的响应向量, 从而确定待检测视频是否与样本库中的视频的类型相同。 的类型相同。
其中, 样本视频库包括至少二个视频, 并且样本视频库中的视频的类 型相同, 如果待检测视频的响应向量符合第二分类规则, 则确定待检测视 频的类型与样本视频库中的视频的类型相同;如果待检测视频的响应向量 不符合第二分类规则,则确定待检测视频的类型与样本视频库中的视频的 类型不同, 从而对检测视频进行分类。 比如: 样本视频库包括五个视频, 且五个视频的类型均为舞蹈类视频, 检测待检测视频的种类是否为舞蹈 类, 对待检测视频进行分类, 可以将待检测视频分为舞蹈类视频和非舞蹈 类视频两个类型。
本发明实施例提供的一种视频分类的方法,能够将样本视频库中的视 频分段, 生成运动原子, 并利用分段结果和运动原子生成样本视频库中的 视频的运动短语,对每一个运动短语计算代表性参数和覆盖性参数的贡献 值, 首先生成包括一个运动原子的运动短语, 根据代表性参数和覆盖性参 数的贡献值, 选出具有良好代表性和覆盖性的运动短语, 得到第 1筛选 结果, 再将一个运动原子加入第 1筛选结果中的运动短语, 得到新的运动 短语,再根据代表性参数和覆盖性参数的贡献值在得到的新的运动短语中 进行筛选, 得到第 2筛选结果, 以此类推, 重复该过程, 直至得到第 n筛选 结果, 根据第 1至第 n筛选结果, 生成描述向量, 利用描述向量, 生成第二 分类规则, 得到待检测视频的响应向量, 检测待检测视频的类型是否与样 本视频库中的视频的类型相同, 从而达到视频分类的目的。 与现有技术相 比,本发明根据用于描述连续复杂运动的运动原子之间的时序关系的运动 短语得到描述向量,使得描述向量以量化数据的形式反映出在连续复杂运 动中, 在时间点附近按照时序关系排列的运动原子, 并以此检测运动短语 与样本视频库中视频匹配程度的高低。 因此利用描述向量进行分类的过 程, 实现了在分类过程中既包括了视频的时间因素, 也包括了用于表示视 频中具体动作、 内容的运动原子, 并且结合二者生成了用于描述连续复杂 运动的运动原子之间的时序关系的运动短语, 对运动短语进行筛选, 筛选 结果中的运动短语具有良好的代表性、 覆盖性和判别性, 减少了生成描述 向量需要的运动短语的数量, 使得得到的描述向量更加精简, 减少了生成 描述向量的时间,并且能够对包括长时间的连续复杂运动的视频进行准确 分类。
进一步的, 在图 2和图 3a所示的方案的基础上, 本发明实施例还提供 了一种视频分类的方法的具体方案,对图 2中的 1023和 103的执行过程进一 步细化, 其中, 1023可以具体实现为 10231 - 10232 , 103可以具体实现为 1031b- 1033b , 如图 3b所示, 包括:
10231 , 根据所述样本视频库中不同类型的视频对应的所述运动短语 的筛选结果, 得到筛选结果集合。
其中, 样本视频库包括至少二个视频, 并且样本视频库包括至少二种 类型的视频。 样本视频库中的每一个类型的视频都具有对应的第 1至第 n 筛选结果, 将样本视频库中不同类型的视频对应的第 1至第 n筛选结果合 并, 得到筛选结果集合, 该筛选结果集合包括样本视频库中所有不同类型 的视频对应的运动短语。
10232 , 根据所述 选结果集合, 生成所述样本视频库中的视频对应 的描述向量。
其中, 将筛选结果集合中的运动短语作为基底, 生成样本视频库中的 视频对应的描述向量, 样本视频库中的每一个视频都有对应的描述向量, 描述向量中的每一个分量都是样本视频库中不同类型的视频对应的第 1至 第 n筛选结果中的运动短语对样本视频库中的视频的响应。
1031b , 生成所述待检测视频对应的响应向量。
其中, 将 10232中得到的筛选结果集合中的运动短语作为基底, 生成 待检测视频对应的响应向量,响应向量中的分量是样本视频库中不同类型 的视频对应的第 1至第 n筛选结果中的运动短语对待检测视频的响应。
1032b , 获取所述样本视频库中各个不同类型的视频对应的所述描述 向量, 并根据所述描述向量, 得到第一分类规则。
其中, 样本视频库包括至少二个视频, 并且样本视频库包括至少二种 类型的视频。 根据样本视频库中各个不同类型的视频对应的所述描述向 量, 生成第一分类规则, 比如: 使用 SVM ( Support Vector Machine , 支持 向量机)分类器进行分类, 将得到的样本视频库中不同类型的视频的描述 向量加入 SVM分类器, SVM分类器会生成分类规则, 分类规则可以是第 一分类规则, 第一分类规则用于确定待检测视频的所属类型。
1033b , 根据所述第一分类规则和所述响应向量, 确定所述待检测视 频的类型与所述样本视频库包括的视频的类型中的一种类型相同,并将所 述待检测视频分类。
其中, 样本视频库包括至少二种类型的视频, 第一分类规则用于确定 待检测视频的所属类型, 比如: 样本视频库中包括三种类型的视频, 分别 是舞蹈类视频、 体育类视频、 杂技类视频, 使用 SVM ( Support Vector Machine , 支持向量机)分类器对待检测视频进行分类, 在 1032b中生成了 第一分类规则, 将 103 lb中得到的待检测视频的响应向量加入 SVM分类 器, 根据第一分类规则, SVM分类器将待检测视频分为舞蹈类视频、 体 育类视频、 杂技类视频三类中的其中一类。
本发明实施例提供的一种视频分类的方法,能够将样本视频库中的视 频分段, 生成运动原子, 并利用分段结果和运动原子生成样本视频库中的 视频的运动短语,对每一个运动短语计算代表性参数和覆盖性参数的贡献 值, 首先生成包括一个运动原子的运动短语, 根据代表性参数和覆盖性参 数的贡献值, 选出具有良好代表性和覆盖性的运动短语, 得到第 1筛选 结果, 再将一个运动原子加入第 1筛选结果中的运动短语, 得到新的运动 短语,再根据代表性参数和覆盖性参数的贡献值在得到的新的运动短语中 进行筛选, 得到第 2筛选结果, 以此类推, 重复该过程, 直至得到第 n筛选 结果, 将样本库中不同类型的视频对应的第 1至第 n筛选结果合并, 得到筛 选结果集合, 并根据 选结果集合生成描述向量, 利用描述向量, 生成第 一分类规则, 得到待检测视频的响应向量, 确定待检测视频的类型与样本 视频库包括的视频的类型中的一种类型相同, 从而达到视频分类的目的。 与现有技术相比,本发明根据用于描述连续复杂运动的运动原子之间的时 序关系的运动短语得到描述向量,使得描述向量以量化数据的形式反映出 在连续复杂运动中, 在时间点附近按照时序关系排列的运动原子, 并以此 检测运动短语与样本视频库中视频匹配程度的高低。因此利用描述向量进 行分类的过程, 实现了在分类过程中既包括了视频的时间因素, 也包括了 用于表示视频中具体动作、 内容的运动原子, 并且结合二者生成了用于描 述连续复杂运动的运动原子之间的时序关系的运动短语,对运动短语进行 筛选, 筛选结果中的运动短语具有良好的代表性、 覆盖性和判别性, 减少 了生成描述向量需要的运动短语的数量, 使得得到的描述向量更加精简, 减少了生成描述向量的时间,并且能够对多个不同类型的包括长时间的连 续复杂运动的视频进行准确分类。
可选的, 在图 2所示的方案的基础上, 本发明实施例还提供了一种视 频分类的方法的具体方案, 增加了 104- 105 , 能够提取并显示待检测视频 的主要信息, 如图 4a所示, 包括:
104 , 获取所述待检测视频的响应向量中的至少一个分量, 并根据所 述至少一个分量得到主要运动短语。
其中,待检测视频的响应向量中的分量可以是筛选出的运动短语对待 检测视频的响应, 分量越大, 表示待检测视频与该分量对应的运动短语的 匹配程度越高。
其中, 主要运动短语为与至少一个分量对应的运动短语, 比如: 待检 测视频的响应向量具有 10个分量, 将 10个分量按照由大到小的顺序排列, 获取前 3个分量, 并得到这前三个分量对应的运动短语, 这前三个分量对 应的运动短语尤是主要运动短语。
105 , 获取并显示所述待检测视频的关键帧。
其中, 关键帧与主要运动短语中的每个运动原子单元的响应最大, 所 以关键帧能够表示待检测视频中的最主要的信息,系统除了显示待检测视 频的关键帧, 还可以显示关键帧附近的帧, 从而将待检测视频中的包括运 动的主要内容呈现出来, 例如: 如图 4b所示, 在一个视频中的跳远动作的 连续的 9帧中, 通过 104- 105的过程, 可以得知关键帧为第 2帧与第 6帧, 显 示关键帧和关键帧附近的帧, 所以显示第 1 - 3帧和第 5 - 7帧。
本发明实施例提供的一种视频分类的方法,能够将样本视频库中的视 频分段, 生成运动原子, 并利用分段结果和运动原子生成样本视频库中的 视频的运动短语,并对运动短语进行 选,根据 选结果,生成描述向量, 利用描述向量, 确定与样本视频库中视频类型相同的待检测视频, 从而达 到视频分类的目的, 还可以根据待检测视频的响应向量中的分量, 得到主 要运动短语, 从而得到并显示关键帧。 与现有技术相比, 本发明根据用于 描述连续复杂运动的运动原子之间的时序关系的运动短语得到描述向量, 使得描述向量以量化数据的形式反映出在连续复杂运动中,在时间点附近 按照时序关系排列的运动原子,并以此检测运动短语与样本视频库中视频 匹配程度的高低。 因此利用描述向量进行分类的过程, 实现了在分类过程 中既包括了视频的时间因素, 也包括了用于表示视频中具体动作、 内容的 运动原子,并且结合二者生成了用于描述连续复杂运动的运动原子之间的 时序关系的运动短语, 对运动短语进行 选, 选结果中的运动短语具有 良好的代表性、覆盖性和判别性, 减少了生成描述向量需要的运动短语的 数量, 使得得到的描述向量更加精简, 并减少了生成描述向量的时间, 并 且能够对包括长时间的连续复杂运动的视频进行准确分类; 同时, 还可以 利用待检测视频的响应向量中的分量, 得到并显示待检测视频的关键帧, 将待检测视频的主要内容清楚简要的呈现出来,使得用户能够快速了解视 频的主要内容。
本发明实施例还提供了一种视频分类的装置 200 , 如图 5所示, 包括: 第一生成模块 201 , 用于按照时间顺序对样本视频库中的视频进行分 段并得到分段结果, 并生成运动原子集合。
其中, 样本视频库包括至少一个视频, 运动原子集合中的运动原子是 根据样本视频库中的视频生成的。
第二生成模块 202 , 用于利用所述运动原子集合和所述分段结果, 生 成对应于所述样本视频库中的视频的描述向量。
分类模块 203 , 用于利用所述描述向量, 确定与所述样本视频库中的 视频的类型相同的待检测视频。
本发明实施例提供的一种视频分类的装置,能够将样本视频库中的视 频分段, 生成运动原子, 并利用分段结果和运动原子生成样本视频库中的 视频的描述向量, 利用描述向量, 确定与样本视频库中视频类型相同的待 检测视频, 从而达到视频分类的目的。 与现有技术相比, 本发明根据用于 描述连续复杂运动的运动原子之间的时序关系的运动短语得到描述向量, 使得描述向量以量化数据的形式反映出在连续复杂运动中,在时间点附近 按照时序关系排列的运动原子,并以此检测运动短语与样本视频库中视频 匹配程度的高低。 因此利用描述向量进行分类的过程, 实现了在分类过程 中既包括了视频的时间因素, 也包括了用于表示视频中具体动作、 内容的 运动原子,并且结合二者生成了用于描述连续复杂运动的运动原子之间的 时序关系的运动短语, 以及根据运动短语生成的描述向量, 从而能够对包 括长时间的连续复杂运动的视频进行准确分类。
可选的, 如图 6所示, 所述第二生成模块 202 , 包括:
第一生成单元 2021 , 用于根据所述运动原子集合和所述分段结果, 生 成对应于所述样本视频库中的视频的运动短语集合。
其中, 运动短语集合包括至少二个运动短语, 一个运动短语包括了按 照一定的先后顺序在时间点附近发生的运动原子。
可选的, 样本视频库包括至少二个视频, 并且样本视频库中的视频的 类型相同。
筛选单元 2022 , 用于筛选所述运动短语, 并得到筛选结果。
第二生成单元 2023 , 用于根据所述 选结果, 生成与所述样本视频库 中的视频对应的描述向量。
本发明实施例提供的一种视频分类的装置,能够将样本视频库中的视 频分段, 生成运动原子, 并利用分段结果和运动原子生成样本视频库中的 视频的运动短语,并对运动短语进行 选,根据 选结果,生成描述向量, 利用描述向量, 确定与样本视频库中视频类型相同的待检测视频, 从而达 到视频分类的目的。 与现有技术相比, 本发明根据用于描述连续复杂运动 的运动原子之间的时序关系的运动短语得到描述向量,使得描述向量以量 化数据的形式反映出在连续复杂运动中,在时间点附近按照时序关系排列 的运动原子, 并以此检测运动短语与样本视频库中视频匹配程度的高低。 因此利用描述向量进行分类的过程,实现了在分类过程中既包括了视频的 时间因素, 也包括了用于表示视频中具体动作、 内容的运动原子, 并且结 合二者生成了用于描述连续复杂运动的运动原子之间的时序关系的运动 短语, 对运动短语进行 选, 选结果中的运动短语具有良好的代表性、 覆盖性和判别性, 减少了生成描述向量需要的运动短语的数量, 使得得到 的描述向量更加精简, 并减少了生成描述向量的时间, 并且能够对包括长 时间的连续复杂运动的视频进行准确分类。
可选的, 如图 7所示, 所述第二生成模块 202, 还包括:
第一获取单元 2024, 用于获取运动原子单元 π(Α,ζσ) , 并根据所述运 动原子单元获取一个运动短语的代表性参数 Rep , c )。
∑ ν,ρ)
其中, Rep( ,C) = (^) , A为运动原子, t为样本视频库中视频 中的时间点, σ为高斯分布的标准差, V为样本视频库中的视频, 为一 个运动短语, r( , )为一个运动短语 样本视频库中的视频的响应, r(F, ?) = min π)Άχν(ν,π) , v(V, π) = max Score( (V, t') ,A)-N(t'\ t, σ) , 指 和一个运动短语响应最大的样本视频库中的视频的集合, c为样本视频库 中的视频的类型的标识, Φ(^ ')为样本视频库中视频中以 开始的分段结 果的视频特征, &。 re(D( t'; )是将 Φ( ,ί')输入到支持向量机 SVM分类器 得到的得分, N 'l^ )是指以 t为均值, σ为标准差的高斯分布, Ω(0指 以 t为中心的一个邻域。
其中,运动短语集合中的运动短语包括一个运动原子集合中的运动原 子。
第二获取单元 2025 , 用于获取所述一个运动短语的覆盖性参数 RepSet( ,c), 并根据所述一个运动短语的覆盖性参数 RepSet(r ,c) , 得到所 述一个运动短语对所述覆盖性参数的贡献值 ARepSet( , c)。 其中,
Figure imgf000026_0001
△RepSet( , c) = RepSet (Γ^, c) - RepSet ( — }, c) , 为样本视频库中标识为 c 的视频分段得到的片段的数量, 为运动短语集合, 且一个运动短语包 含的运动原子所属视频类型的标识为 c。
针对所述运动短语集合中的每一个运动短语, 运行上述单元, 并得到 运动短语集合中的每一个运动短语的代表性参数和贡献值。
所述筛选单元 2022 , 包括:
筛选子单元 20221 , 用于根据所述运动短语集合中的每一个运动短语 的代表性参数和贡献值, 按照 Rep( ,c) + ARepSet( ,c)的值由大到小的顺序 对所述运动短语集合中的运动短语进行排序,并将前 个运动短语作为第 1筛选结果, 为大于等于 1的正整数。
添加子单元 20222 , 用于从所述运动原子集合中提取一个运动原子加 入所述第 1 选结果中的运动短语,使得所述第 1 选结果中的运动短语具 有 2个运动原子。
连续运行所述 选子单元和所述添加子单元, 直至得到第 n- 1 选结 果, 再从所述运动原子集合中提取一个运动原子加入所述第 n- 1 选结果 中的运动短语, 使得所述第 n- 1筛选结果中的运动短语具有 n个运动原子, 再根据所述第 n- 1筛选结果中的运动短语得到第 n筛选结果, 所述第 n筛选 结果为按照 Rep(„, c) + ARepSet (尸", c)的值由大到 'J、的顺序排列的前 mn个运 动短语, mn为大于等于 1的正整数, 第 n筛选结果中的运动短语具有 n个运 动原子, n为大于等于 1的正整数。
第一生成子单元 20223 , 用于根据所述第 1至第 n筛选结果, 生成所述 描述向量。
本发明实施例提供的一种视频分类的装置,能够将样本视频库中的视 频分段, 生成运动原子, 并利用分段结果和运动原子生成样本视频库中的 视频的运动短语,对每一个运动短语计算代表性参数和覆盖性参数的贡献 值, 首先生成包括一个运动原子的运动短语, 根据代表性参数和覆盖性参 数的贡献值, 选出具有良好代表性和覆盖性的运动短语, 得到第 1筛选 结果, 再将一个运动原子加入第 1筛选结果中的运动短语, 得到新的运动 短语,再根据代表性参数和覆盖性参数的贡献值在得到的新的运动短语中 进行筛选, 得到第 2筛选结果, 以此类推, 重复该过程, 直至得到第 n筛选 结果, 根据第 1至第 n筛选结果, 生成描述向量, 利用描述向量, 生成第二 分类规则, 得到待检测视频的响应向量, 检测待检测视频的类型是否与样 本视频库中的视频的类型相同, 从而达到视频分类的目的。 与现有技术相 比,本发明根据用于描述连续复杂运动的运动原子之间的时序关系的运动 短语得到描述向量,使得描述向量以量化数据的形式反映出在连续复杂运 动中, 在时间点附近按照时序关系排列的运动原子, 并以此检测运动短语 与样本视频库中视频匹配程度的高低。 因此利用描述向量进行分类的过 程, 实现了在分类过程中既包括了视频的时间因素, 也包括了用于表示视 频中具体动作、 内容的运动原子, 并且结合二者生成了用于描述连续复杂 运动的运动原子之间的时序关系的运动短语, 对运动短语进行筛选, 筛选 结果中的运动短语具有良好的代表性、 覆盖性和判别性, 减少了生成描述 向量需要的运动短语的数量, 使得得到的描述向量更加精简, 减少了生成 描述向量的时间,并且能够对包括长时间的连续复杂运动的视频进行准确 分类。
可选的, 如图 8所示, 所述第二生成单元 2023 , 包括:
集合子单元 20231 , 用于根据所述样本视频库中不同类型的视频对应 的所述运动短语的筛选结果, 得到筛选结果集合。
其中, 样本视频库包括至少二个视频, 并且样本视频库包括至少二种 类型的视频。
第二生成子单元 20232 , 用于根据所述筛选结果集合, 生成所述样本 视频库中的视频对应的描述向量。
本发明实施例提供的一种视频分类的装置,能够将样本视频库中的视 频分段, 生成运动原子, 并利用分段结果和运动原子生成样本视频库中的 视频的运动短语,对每一个运动短语计算代表性参数和覆盖性参数的贡献 值, 首先生成包括一个运动原子的运动短语, 根据代表性参数和覆盖性参 数的贡献值, 选出具有良好代表性和覆盖性的运动短语, 得到第 1筛选 结果, 再将一个运动原子加入第 1筛选结果中的运动短语, 得到新的运动 短语,再根据代表性参数和覆盖性参数的贡献值在得到的新的运动短语中 进行筛选, 得到第 2筛选结果, 以此类推, 重复该过程, 直至得到第 n筛选 结果, 将样本库中不同类型的视频对应的第 1至第 n筛选结果合并, 得到筛 选结果集合, 并根据 选结果集合生成描述向量, 利用描述向量, 生成第 一分类规则, 得到待检测视频的响应向量, 确定待检测视频的类型与样本 视频库包括的视频的类型中的一种类型相同, 从而达到视频分类的目的。 与现有技术相比,本发明根据用于描述连续复杂运动的运动原子之间的时 序关系的运动短语得到描述向量,使得描述向量以量化数据的形式反映出 在连续复杂运动中, 在时间点附近按照时序关系排列的运动原子, 并以此 检测运动短语与样本视频库中视频匹配程度的高低。因此利用描述向量进 行分类的过程, 实现了在分类过程中既包括了视频的时间因素, 也包括了 用于表示视频中具体动作、 内容的运动原子, 并且结合二者生成了用于描 述连续复杂运动的运动原子之间的时序关系的运动短语,对运动短语进行 筛选, 筛选结果中的运动短语具有良好的代表性、 覆盖性和判别性, 减少 了生成描述向量需要的运动短语的数量, 使得得到的描述向量更加精简, 减少了生成描述向量的时间,并且能够对多个不同类型的包括长时间的连 续复杂运动的视频进行准确分类。
可选的, 如图 9所示, 所述分类模块 203 , 包括:
第三生成单元 2031 , 用于生成所述待检测视频对应的响应向量。
第三获取单元 2032 ,用于获取所述样本视频库中各个不同类型的视频 对应的所述描述向量, 并根据所述描述向量, 得到第一分类规则。
其中, 第一分类规则用于确定待检测视频的所属类型。
第一分类单元 2033 , 用于根据所述第一分类规则和所述响应向量, 确 定所述待检测视频的类型与所述样本视频库包括的视频的类型中的一种 类型相同, 并将所述待检测视频分类。
本发明实施例提供的一种视频分类的装置,能够将样本视频库中的视 频分段, 生成运动原子, 并利用分段结果和运动原子生成样本视频库中的 视频的运动短语,对每一个运动短语计算代表性参数和覆盖性参数的贡献 值, 首先生成包括一个运动原子的运动短语, 根据代表性参数和覆盖性参 数的贡献值, 选出具有良好代表性和覆盖性的运动短语, 得到第 1筛选 结果, 再将一个运动原子加入第 1筛选结果中的运动短语, 得到新的运动 短语,再根据代表性参数和覆盖性参数的贡献值在得到的新的运动短语中 进行筛选, 得到第 2筛选结果, 以此类推, 重复该过程, 直至得到第 n筛选 结果, 将样本库中不同类型的视频对应的第 1至第 n筛选结果合并, 得到筛 选结果集合, 并根据 选结果集合生成描述向量, 利用描述向量, 生成第 一分类规则, 得到待检测视频的响应向量, 确定待检测视频的类型与样本 视频库包括的视频的类型中的一种类型相同, 从而达到视频分类的目的。 与现有技术相比,本发明根据用于描述连续复杂运动的运动原子之间的时 序关系的运动短语得到描述向量,使得描述向量以量化数据的形式反映出 在连续复杂运动中, 在时间点附近按照时序关系排列的运动原子, 并以此 检测运动短语与样本视频库中视频匹配程度的高低。因此利用描述向量进 行分类的过程, 实现了在分类过程中既包括了视频的时间因素, 也包括了 用于表示视频中具体动作、 内容的运动原子, 并且结合二者生成了用于描 述连续复杂运动的运动原子之间的时序关系的运动短语,对运动短语进行 筛选, 筛选结果中的运动短语具有良好的代表性、 覆盖性和判别性, 减少 了生成描述向量需要的运动短语的数量, 使得得到的描述向量更加精简, 减少了生成描述向量的时间,并且能够对多个不同类型的包括长时间的连 续复杂运动的视频进行准确分类。
可选的, 如图 10所示, 所述分类模块 203 , 包括:
第四生成单元 2034 , 用于生成所述待检测视频对应的响应向量。 向量, 得到第二分类规则。
其中, 第二分类规则用于检测待检测视频是否与样本视频库中的视频 的类型相同。
检测单元 2036 ,用于检测所述待检测视频的响应向量是否符合所述第 二分类规则。
第二分类单元 2037 , 用于当符合时, 确定所述待检测视频与所述样本 视频库中的视频的类型相同。
本发明实施例提供的一种视频分类的装置,能够将样本视频库中的视 频分段, 生成运动原子, 并利用分段结果和运动原子生成样本视频库中的 视频的运动短语,对每一个运动短语计算代表性参数和覆盖性参数的贡献 值, 首先生成包括一个运动原子的运动短语, 根据代表性参数和覆盖性参 数的贡献值, 选出具有良好代表性和覆盖性的运动短语, 得到第 1筛选 结果, 再将一个运动原子加入第 1筛选结果中的运动短语, 得到新的运动 短语,再根据代表性参数和覆盖性参数的贡献值在得到的新的运动短语中 进行筛选, 得到第 2筛选结果, 以此类推, 重复该过程, 直至得到第 n筛选 结果, 根据第 1至第 n筛选结果, 生成描述向量, 利用描述向量, 生成第二 分类规则, 得到待检测视频的响应向量, 检测待检测视频的类型是否与样 本视频库中的视频的类型相同, 从而达到视频分类的目的。 与现有技术相 比,本发明根据用于描述连续复杂运动的运动原子之间的时序关系的运动 短语得到描述向量,使得描述向量以量化数据的形式反映出在连续复杂运 动中, 在时间点附近按照时序关系排列的运动原子, 并以此检测运动短语 与样本视频库中视频匹配程度的高低。 因此利用描述向量进行分类的过 程, 实现了在分类过程中既包括了视频的时间因素, 也包括了用于表示视 频中具体动作、 内容的运动原子, 并且结合二者生成了用于描述连续复杂 运动的运动原子之间的时序关系的运动短语, 对运动短语进行筛选, 筛选 结果中的运动短语具有良好的代表性、 覆盖性和判别性, 减少了生成描述 向量需要的运动短语的数量, 使得得到的描述向量更加精简, 减少了生成 描述向量的时间,并且能够对包括长时间的连续复杂运动的视频进行准确 分类。
可选的, 如图 11所示, 所述装置 200 , 还包括:
获取模块 204 , 用于获取所述待检测视频的响应向量中的至少一个分 量, 并根据所述至少一个分量得到主要运动短语。
其中, 主要运动短语为与至少一个分量对应的运动短语。
显示模块 205 , 用于获取并显示所述待检测视频的关键帧。
其中, 关键帧与主要运动短语中的每个运动原子单元的响应最大。 本发明实施例提供的一种视频分类的装置,能够将样本视频库中的视 频分段, 生成运动原子, 并利用分段结果和运动原子生成样本视频库中的 视频的运动短语,并对运动短语进行 选,根据 选结果,生成描述向量, 利用描述向量, 确定与样本视频库中视频类型相同的待检测视频, 从而达 到视频分类的目的, 还可以根据待检测视频的响应向量中的分量, 得到主 要运动短语, 从而得到并显示关键帧。 与现有技术相比, 本发明根据用于 描述连续复杂运动的运动原子之间的时序关系的运动短语得到描述向量, 使得描述向量以量化数据的形式反映出在连续复杂运动中,在时间点附近 按照时序关系排列的运动原子,并以此检测运动短语与样本视频库中视频 匹配程度的高低。 因此利用描述向量进行分类的过程, 实现了在分类过程 中既包括了视频的时间因素, 也包括了用于表示视频中具体动作、 内容的 运动原子,并且结合二者生成了用于描述连续复杂运动的运动原子之间的 时序关系的运动短语, 对运动短语进行 选, 选结果中的运动短语具有 良好的代表性、覆盖性和判别性, 减少了生成描述向量需要的运动短语的 数量, 使得得到的描述向量更加精简, 并减少了生成描述向量的时间, 并 且能够对包括长时间的连续复杂运动的视频进行准确分类; 同时, 还可以 利用待检测视频的响应向量中的分量, 得到并显示待检测视频的关键帧, 将待检测视频的主要内容清楚简要的呈现出来,使得用户能够快速了解视 频的主要内容。
本发明实施例还提供了一种视频分类系统 300 , 如图 12所示, 包括: 至少一个处理器 301 , 例如 CPU , 至少一个通信总线 302 , 存储器 303 , 至 少一个网络接口 304或者用户接口 305。 通信总线 302用于实现这些组件之 间的连接通信。 可选的, 用户接口 305包括显示器、 键盘、 鼠标、 触摸屏 等设备。 存储器 303可能包含高速 RAM存储器, 也可能还包括非不稳定的 存者器 ( non- volatile memory ), 例如至少一个磁盘存 4诸器。
具体的, 存储器 303可以用于存储样本视频库和样本视频库中的视频 的分段结果, 还可以用于存储运动原子集合、 样本视频库中的视频的描述 向量和运动短语集合, 还可以用于存储运动短语的 选结果、 样本视频库 中的视频的类型和待检测视频的响应向量,还可以用于存储运动短语的代 表性参数、覆盖性参数和覆盖性参数的贡献值等等, 还可以用于存储生成 的第一分类规则和第二分类规则。
具体的, 处理器 301可以用于按照时间顺序对样本视频库中的视频进 行分段并得到分段结果, 并生成运动原子集合; 以及, 用于利用所述运动 原子集合和所述分段结果,生成对应于所述样本视频库中的视频的描述向 量; 以及, 用于利用所述描述向量, 确定与所述样本视频库中的视频的类 型相同的待检测视频。
其中, 样本视频库包括至少一个视频, 运动原子集合中的运动原子是 根据样本视频库中的视频生成的。
具体的, 处理器 301还可以用于根据所述运动原子集合和所述分段结 果, 生成对应于所述样本视频库中的视频的运动短语集合; 以及, 用于筛 选所述运动短语, 并得到筛选结果; 以及, 用于根据所述筛选结果, 生成 与所述样本视频库中的视频对应的描述向量。
其中, 运动短语集合包括至少二个运动短语, 一个运动短语包括了按 照一定的先后顺序在时间点附近发生的运动原子。
具体的, 处理器 301还可以用于获取运动原子单元 π(Α,ζσ) , 并根据所 述运动原子单元获取一个运动短语的代表性参数 Rep( , c) ; 以及, 用于获 取所述一个运动短语的覆盖性参数 RepSet(C) ,并根据所述一个运动短语 的覆盖性参数 RepSet(J ,c) ,得到所述一个运动短语对所述覆盖性参数的贡 献值 ARepSet( ,c) , RepSet(n, c) 以及, 用于针对所述运动
Figure imgf000033_0001
短语集合中的每一个运动短语, 执行上述过程, 并得到所述运动短语集合 中的每一个运动短语的代表性参数和贡献值。
其中, Rep(0 (^) , A为运动原子, t为样本视频库中视频
|S( ,c)|
中的时间点, σ为高斯分布的标准差, V为样本视频库中的视频, 为一 个运动短语, r( , )为一个运动短语 样本视频库中的视频的响应, r(F, ?) = min π)Άχν(ν,π) , v(V, π) = max Score( (V, t') ,A)-N(t'\ t, σ) , 指
OR^ TteORi ί'≡Ω(ί) / \ ' 1 和一个运动短语响应最大的样本视频库中的视频的集合, C为样本视频库 中的视频的类型的标识, 为样本视频库中视频中以 开始的分段结 果的视频特征, &。 re(D( t'; )是将 Φ( ,ί')输入到支持向量机 SVM分类器 得到的得分, N 'l^ )是指以 t为均值, σ为标准差的高斯分布, Ω(0指 以 t为中心的一个邻域。
其中, ARepSet( ,c) = RepSet(r,c)— RepSet(r— }, , 7;为样本视频库 中标识为 c的视频分段得到的片段的数量, 为运动短语集合, 且一个运 动短语包含的运动原子所属视频类型的标识为 C。
其中, 样本视频库包括至少二个视频, 并且样本视频库中的视频的类 型相同。 运动短语集合中的运动短语包括一个运动原子集合中的运动原 子。
具体的, 处理器 301还可以用于根据所述运动短语集合中的每一个运 动短语的代表性参数和贡献值, 按照 Rep( , c) + ARepSet(^, c)的值由大到 d、 的顺序对所述运动短语集合中的运动短语进行排序,并将前 个运动短语 作为第 1筛选结果; 以及, 用于从所述运动原子集合中提取一个运动原子 加入所述第 1筛选结果中的运动短语,使得所述第 1筛选结果中的运动短语 具有 2个运动原子; 以及, 用于重复上述过程, 直至得到第 n-1筛选结果, 再从所述运动原子集合中提取一个运动原子加入所述第 n-1筛选结果中的 运动短语, 使得所述第 n- 1筛选结果中的运动短语具有 n个运动原子, 再根 据所述第 n- 1筛选结果中的运动短语得到第 n筛选结果, 所述第 n筛选结果 为按照 Rep(„ , c) + ARepSet(„ , c)的值由大到小的顺序排列的前 mn个运动短 语, mn为大于等于 1的正整数, 第 n筛选结果中的运动短语具有 n个运动原 子; 以及, 用于根据所述第 1至第 n筛选结果, 生成所述描述向量。
其中, 为大于等于 1的正整数, n为大于等于 1的正整数。
具体的, 处理器 301还可以用于根据所述样本视频库中不同类型的视 频对应的所述运动短语的筛选结果, 得到筛选结果集合; 以及, 用于根据 所述 选结果集合, 生成所述样本视频库中的视频对应的描述向量。
其中, 样本视频库包括至少二个视频, 并且样本视频库包括至少二种 类型的视频。
具体的,处理器 301还可以用于生成所述待检测视频对应的响应向量; 以及,用于获取所述样本视频库中各个不同类型的视频对应的所述描述向 量, 并根据所述描述向量, 得到第一分类规则; 以及, 用于根据所述第一 分类规则和所述响应向量,确定所述待检测视频的类型与所述样本视频库 包括的视频的类型中的一种类型相同, 并将所述待检测视频分类。
其中, 第一分类规则用于确定待检测视频的所属类型。
具体的,处理器 301还可以用于生成所述待检测视频对应的响应向量; 类规则; 以及, 用于检测所述待检测视频的响应向量是否符合所述第二分 类规则; 以及, 用于当符合时, 确定所述待检测视频与所述样本视频库中 的视频的类型相同。
其中, 第二分类规则用于检测待检测视频是否与样本视频库中的视频 的类型相同。
具体的, 处理器 301还可以用于获取所述待检测视频的响应向量中的 至少一个分量, 并根据所述至少一个分量得到主要运动短语; 以及, 用于 获取并显示所述待检测视频的关键帧。 其中, 主要运动短语为与至少一个分量对应的运动短语。 关键帧与主 要运动短语中的每个运动原子单元的响应最大。
本发明实施例提供的一种视频分类系统, 能够将样本视频库中的视频 分段, 生成运动原子, 并利用分段结果和运动原子生成样本视频库中的视 频的运动短语, 并对运动短语进行 选, 根据 选结果, 生成描述向量, 利用描述向量, 确定与样本视频库中视频类型相同的待检测视频, 从而达 到视频分类的目的, 还可以根据待检测视频的响应向量中的分量, 得到主 要运动短语, 从而得到并显示关键帧。 与现有技术相比, 本发明根据用于 描述连续复杂运动的运动原子之间的时序关系的运动短语得到描述向量, 使得描述向量以量化数据的形式反映出在连续复杂运动中,在时间点附近 按照时序关系排列的运动原子,并以此检测运动短语与样本视频库中视频 匹配程度的高低。 因此利用描述向量进行分类的过程, 实现了在分类过程 中既包括了视频的时间因素, 也包括了用于表示视频中具体动作、 内容的 运动原子,并且结合二者生成了用于描述连续复杂运动的运动原子之间的 时序关系的运动短语, 对运动短语进行 选, 选结果中的运动短语具有 良好的代表性、覆盖性和判别性, 减少了生成描述向量需要的运动短语的 数量, 使得得到的描述向量更加精简, 并减少了生成描述向量的时间, 并 且能够对包括长时间的连续复杂运动的视频进行准确分类; 同时, 还可以 利用待检测视频的响应向量中的分量, 得到并显示待检测视频的关键帧, 将待检测视频的主要内容清楚简要的呈现出来,使得用户能够快速了解视 频的主要内容。 同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的 不同之处。 尤其, 对于设备实施例而言, 由于其基本相似于方法实施例, 所以描述得比较简单, 相关之处参见方法实施例的部分说明即可。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分 流程, 是可以通过计算机程序来指令相关的硬件来完成, 所述的程序可存 储于一计算机可读取存储介质中, 该程序在执行时, 可包括如上述各方法 的实施例的流程。 其中, 所述的存储介质可为磁碟、 光盘、 只读存储记忆 体 ( Read-Only Memory , ROM ) 或随机存 者 i己忆体 ( Random Access Memory, RAM ) 等。
以上所述, 仅为本发明的具体实施方式, 但本发明的保护范围并不局 限于此, 任何熟悉本技术领域的技术人员在本发明揭露的技术范围内, 可 轻易想到的变化或替换, 都应涵盖在本发明的保护范围之内。 因此, 本发 明的保护范围应该以权利要求的保护范围为准。

Claims

权 利 要 求 书
1、 一种视频分类的方法, 其特征在于, 包括:
按照时间顺序对样本视频库中的视频进行分段并得到分段结果, 并生 成运动原子集合, 所述样本视频库包括至少一个视频, 所述运动原子集合 利用所述运动原子集合和所述分段结果, 生成对应于所述样本视频库 中的视频的描述向量;
利用所述描述向量, 确定与所述样本视频库中的视频的类型相同的待 检测视频。
2、 根据权利要求 1所述的方法, 其特征在于, 所述利用所述运动原子 集合和所述分段结果, 生成对应于所述样本视频库中的视频的描述向量, 包括:
根据所述运动原子集合和所述分段结果, 生成对应于所述样本视频库 中的视频的运动短语集合, 所述运动短语集合包括至少二个运动短语, 一 个运动短语包括了按照一定的先后顺序在时间点附近发生的运动原子; 筛选所述运动短语, 并得到 选结果;
根据所述 选结果, 生成与所述样本视频库中的视频对应的描述向量。
3、 根据权利要求 1或 2所述的方法, 其特征在于, 所述样本视频库包括 至少二个视频, 并且所述样本视频库中的视频的类型相同。
4、 根据权利要求 3所述的视频分类的方法, 其特征在于, 所述运动短 语集合中的运动短语包括一个所述运动原子集合中的运动原子; 所述方法 还包括:
获取运动原子单元 π(Α,ζσ) ,并根据所述运动原子单元获取一个运动短
∑ r(V, Px)
语的代表性参数 Rep( ,c) , Rep(^,c) = i≡S( ^ , A为运动原子 , t为所述 样本视频库中视频中的时间点, σ为高斯分布的标准差, V为所述样本视频 库中的视频, ?1为所述一个运动短语, r( , )为所述一个运动短语 Pi对所 述 样 本 视 频 库 中 的 视 频 的 响 应 , r(F,^)=min ΐΏΆχν(¥,π) , v(V, π) = max Score (Φ (V,t'),A)-N(t'\t,a), ( .指计算所述样本视频库中的 视频与时间相邻的所述运动原子单元的响应, S( , c)表示和所述一个运动 短语响应最大的所述样本视频库中的视频的集合, c为所述样本视频库中的 视频的类型的标识, Φ( )为所述样本视频库中视频中以 开始的所述分段 结果的视频特征, &。 re(D( t'; )是将 Φ( )输入到支持向量机 SVM分类器 得到的得分, N 'l^ )是指以 t为均值, σ为标准差的高斯分布, Ω(0指以 t为中心的一个邻域;
获取所述一个运动短语的覆盖性参数 RepSet(r ,c), 并根据所述一个运 动短语的覆盖性参数 RepSet(r^c), 得到所述一个运动短语对所述覆盖性参 数的贡献值 ARepSet( ,c) , RepSet(r» |U rS( ,c)
ARepSet(^ , c) - RepSet (r^ , c) - RepSet (r^ - { ^ } , c) , 7;为所述样本视频库中标识 为 c的视频分段得到的片段的数量, 为所述运动短语集合, 且所述一个 运动短语包含的所述运动原子所属视频类型的标识为 c;
针对所述运动短语集合中的每一个运动短语, 执行上述过程, 并得到 所述运动短语集合中的每一个运动短语的代表性参数和贡献值;
所述 选所述运动短语, 得到 选结果, 包括:
根据所述运动短语集合中的每一个运动短语的代表性参数和贡献值, 按照 Rep( ,c) + ARepSet( ,c)的值由大到小的顺序对所述运动短语集合中的 运动短语进行排序, 并将前 个运动短语作为第 1 选结果, !^为大于等 于 1的正整数;
从所述运动原子集合中提取一个运动原子加入所述第 1筛选结果中的 运动短语, 使得所述第 1筛选结果中的运动短语具有 2个运动原子;
重复上述过程, 直至得到第 n-1筛选结果, 再从所述运动原子集合中提 取一个运动原子加入所述第 n-1筛选结果中的运动短语, 使得所述第 n-1筛 选结果中的运动短语具有 n个运动原子, 再根据所述第 n-1筛选结果中的运 动短语得到第 n筛选结果, 所述第 n筛选结果为按照 Rep(P„, c) + ARepSet(„, c) 的值由大到小的顺序排列的前 mn个运动短语, mn为大于等于 1的正整数, 第 n筛选结果中的运动短语具有 n个运动原子, n为大于等于 1的正整数; 根据所述第 1至第 n筛选结果, 生成所述描述向量。
5、 根据权利要求 4所述的视频分类的方法, 其特征在于, 所述样本视 频库包括至少二个视频, 并且所述样本视频库包括至少二种类型的视频; 所述根据 选结果, 生成与所述样本视频库中的视频对应的描述向量, 包括:
根据所述样本视频库中不同类型的视频对应的所述运动短语的 选结 果, 得到筛选结果集合;
根据所述 选结果集合, 生成所述样本视频库中的视频对应的描述向 量。
6、 根据权利要求 5所述的方法, 其特征在于, 所述利用所述描述向量, 确定与所述样本视频库中的视频的类型相同的待检测视频, 包括:
生成所述待检测视频对应的响应向量;
获取所述样本视频库中各个不同类型的视频对应的所述描述向量, 并 根据所述描述向量, 得到第一分类规则, 所述第一分类规则用于确定所述 待检测视频的所属类型;
根据所述第一分类规则和所述响应向量, 确定所述待检测视频的类型 与所述样本视频库包括的视频的类型中的一种类型相同, 并将所述待检测 视频分类。
7、 根据权利要求 1或 3所述的视频分类的方法, 其特征在于, 所述利用 所述描述向量,确定与所述样本视频库中的视频的类型相同的待检测视频 , 包括:
生成所述待检测视频对应的响应向量;
根据所述样本视频库中各个视频对应的描述向量, 得到第二分类规则, 所述第二分类规则用于检测所述待检测视频是否与所述样本视频库中的视 频的类型相同;
检测所述待检测视频的响应向量是否符合所述第二分类规则; 同。
8、 根据权利要求 1所述的视频分类的方法, 其特征在于, 还包括: 获取所述待检测视频的响应向量中的至少一个分量, 并根据所述至少 一个分量得到主要运动短语, 所述主要运动短语为与所述至少一个分量对 应的运动短语;
获取并显示所述待检测视频的关键帧, 所述关键帧与所述主要运动短 语中的每个运动原子单元的响应最大。
9、 一种视频分类的装置, 其特征在于, 包括:
第一生成模块, 用于按照时间顺序对样本视频库中的视频进行分段并 得到分段结果, 并生成运动原子集合, 所述样本视频库包括至少一个视频, 第二生成模块, 用于利用所述运动原子集合和所述分段结果, 生成对 应于所述样本视频库中的视频的描述向量;
分类模块, 用于利用所述描述向量, 确定与所述样本视频库中的视频 的类型相同的待检测视频。
10、 根据权利要求 9所述的装置, 其特征在于, 所述第二生成模块, 包 括:
第一生成单元, 用于根据所述运动原子集合和所述分段结果, 生成对 应于所述样本视频库中的视频的运动短语集合, 所述运动短语集合包括至 少二个运动短语, 一个运动短语包括了按照一定的先后顺序在时间 , 附近 发生的运动原子;
筛选单元, 用于 选所述运动短语, 并得到 选结果;
第二生成单元, 用于根据所述 选结果, 生成与所述样本视频库中的 视频对应的描述向量。
11、 根据权利要求 9或 10所述的装置, 其特征在于, 所述样本视频库包 括至少二个视频, 并且所述样本视频库中的视频的类型相同。
12、 根据权利要求 11所述的装置, 其特征在于, 所述运动短语集合中 的运动短语包括一个所述运动原子集合中的运动原子;所述第二生成模块, 还包括:
第一获取单元, 用于获取运动原子单元 π(Α,ζσ), 并根据所述运动原子 单元获取一个运动短语的代表性参数 Rep( ,c), Rep( ,c)=' ) , A 为运动原子, t为所述样本视频库中视频中的时间点, σ为高斯分布的标准 差, V为所述样本视频库中的视频, ?为所述一个运动短语, r( , )为所述 一个运动短语 Pi对所述样本视频库中的视频的响应,
r(F, ?) = min π)Άχν(ν,π) , v(V, π) = max Score (φ(¥, t') ,A)-N(t'\ t, σ) , 指
1 OR^ eORt t' (t) , ) ' 1
S( ,C)表示和所述一个运动短语响应最大的所述样本视频库中的视频的集 合, c为所述样本视频库中的视频的类型的标识, 为所述样本视频库 中视频中以 f开始的所述分段结果的视频特征, &。 Γβ(Φ( ,
Figure imgf000042_0001
^是将 Φ( , t') 输入到支持向量机 SVM分类器得到的得分, N(Z'^ )是指以 t为均值, σ为 标准差的高斯分布, Ω(0指以 t为中心的一个邻域;
第二获取单元,用于获取所述一个运动短语的覆盖性参数 RepSet(r c), 并根据所述一个运动短语的覆盖性参数 RepSet(r,c), 得到所述一个运动短 语对所述覆盖性参数的贡献值 ARepSet( ,c) ,
Figure imgf000042_0002
ARepSet(^ , c) - RepSet (r^ , c) - RepSet (r^ - { ^ } , c) , 7;为所述样本视频库中标识 为 c的视频分段得到的片段的数量, 为所述运动短语集合, 且所述一个 运动短语包含的所述运动原子所属视频类型的标识为 c; 针对所述运动短语集合中的每一个运动短语, 执行上述过程, 并得到 所述运动短语集合中的每一个运动短语的代表性参数和贡献值;
所述 选单元, 包括:
筛选子单元, 用于根据所述运动短语集合中的每一个运动短语的代表 性参数和贡献值,按照 Rep( ,c) + A epSet( ,c)的值由大到小的顺序对所述运 动短语集合中的运动短语进行排序, 并将前 个运动短语作为第 1 选结 果, 为大于等于 1的正整数;
添加子单元, 用于从所述运动原子集合中提取一个运动原子加入所述 第 1筛选结果中的运动短语,使得所述第 1 选结果中的运动短语具有 2个运 动原子;
连续运行所述 选子单元和所述添加子单元, 直至得到第 n-1 选结 果, 再从所述运动原子集合中提取一个运动原子加入所述第 n- 1筛选结果中 的运动短语, 使得所述第 n-1筛选结果中的运动短语具有 n个运动原子, 再 根据所述第 n-1筛选结果中的运动短语得到第 n筛选结果, 所述第 n筛选结果 为按照 Rep(„ , c) + ARepSet(„ , c)的值由大到小的顺序排列的前 mn个运动短 语, mn为大于等于 1的正整数,第 n筛选结果中的运动短语具有 n个运动原子, n为大于等于 1的正整数;
第一生成子单元, 用于根据所述第 1至第 n筛选结果, 生成所述描述向 量。
13、 根据权利要求 12所述的视频分类的装置, 其特征在于, 所述样本 视频库包括至少二个视频,并且所述样本视频库包括至少二种类型的视频; 所述第二生成单元, 包括:
集合子单元, 用于根据所述样本视频库中不同类型的视频对应的所述 运动短语的筛选结果, 得到筛选结果集合;
第二生成子单元, 用于根据所述 选结果集合, 生成所述样本视频库 中的视频对应的描述向量。
14、 根据权利要求 13所述的装置, 其特征在于, 所述分类模块, 包括: 第三生成单元, 用于生成所述待检测视频对应的响应向量; 第三获取单元, 用于获取所述样本视频库中各个不同类型的视频对应 的所述描述向量, 并根据所述描述向量, 得到第一分类规则, 所述第一分 类规则用于确定所述待检测视频的所属类型;
第一分类单元, 用于根据所述第一分类规则和所述响应向量, 确定所 述待检测视频的类型与所述样本视频库包括的视频的类型中的一种类型相 同, 并将所述待检测视频分类。
15、 根据权利要求 9或 11所述的装置, 其特征在于, 所述分类模块, 包 括:
第四生成单元, 用于生成所述待检测视频对应的响应向量; 得到第二分类规则, 所述第二分类规则用于检测所述待检测视频是否与所 述样本视频库中的视频的类型相同;
检测单元, 用于检测所述待检测视频的响应向量是否符合所述第二分 类规则;
第二分类单元, 用于当符合时, 确定所述待检测视频与所述样本视频 库中的视频的类型相同。
16、 根据权利要求 9所述的装置, 其特征在于, 还包括:
获取模块, 用于获取所述待检测视频的响应向量中的至少一个分量, 并根据所述至少一个分量得到主要运动短语, 所述主要运动短语为与所述 至少一个分量对应的运动短语;
显示模块, 用于获取并显示所述待检测视频的关键帧, 所述关键帧与 所述主要运动短语中的每个运动原子单元的响应最大。
PCT/CN2014/075510 2013-11-29 2014-04-16 视频分类的方法和装置 WO2015078134A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP14866346.1A EP3067831A4 (en) 2013-11-29 2014-04-16 VIDEO CLASSIFICATION PROCESS AND DEVICE
US15/167,388 US10002296B2 (en) 2013-11-29 2016-05-27 Video classification method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310631901.6A CN104679779B (zh) 2013-11-29 2013-11-29 视频分类的方法和装置
CN201310631901.6 2013-11-29

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/167,388 Continuation US10002296B2 (en) 2013-11-29 2016-05-27 Video classification method and apparatus

Publications (1)

Publication Number Publication Date
WO2015078134A1 true WO2015078134A1 (zh) 2015-06-04

Family

ID=53198281

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/075510 WO2015078134A1 (zh) 2013-11-29 2014-04-16 视频分类的方法和装置

Country Status (4)

Country Link
US (1) US10002296B2 (zh)
EP (1) EP3067831A4 (zh)
CN (1) CN104679779B (zh)
WO (1) WO2015078134A1 (zh)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108353213A (zh) * 2015-10-30 2018-07-31 惠普发展公司,有限责任合伙企业 视频内容概括和类选择
CN108288015B (zh) * 2017-01-10 2021-10-22 武汉大学 基于时间尺度不变性的视频中人体动作识别方法及系统
CN108154137B (zh) * 2018-01-18 2020-10-20 厦门美图之家科技有限公司 视频特征学习方法、装置、电子设备及可读存储介质
CN108769823B (zh) * 2018-05-28 2019-05-28 广州虎牙信息科技有限公司 直播间显示方法、装置、设备
CN110096605B (zh) * 2019-04-26 2021-06-04 北京迈格威科技有限公司 图像处理方法及装置、电子设备、存储介质
CN110163129B (zh) * 2019-05-08 2024-02-13 腾讯科技(深圳)有限公司 视频处理的方法、装置、电子设备及计算机可读存储介质
CN111125432B (zh) * 2019-12-25 2023-07-11 重庆能投渝新能源有限公司石壕煤矿 一种视频匹配方法及基于该方法的培训快速匹配系统
CN112100436B (zh) * 2020-09-29 2021-07-06 新东方教育科技集团有限公司 舞蹈片段识别方法、舞蹈片段识别装置和存储介质
CN113362800A (zh) * 2021-06-02 2021-09-07 深圳云知声信息技术有限公司 用于语音合成语料库的建立方法、装置、设备和介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070255755A1 (en) * 2006-05-01 2007-11-01 Yahoo! Inc. Video search engine using joint categorization of video clips and queries based on multiple modalities
CN101894276A (zh) * 2010-06-01 2010-11-24 中国科学院计算技术研究所 人体动作识别的训练方法和识别方法
CN102034096A (zh) * 2010-12-08 2011-04-27 中国科学院自动化研究所 基于自顶向下运动注意机制的视频事件识别方法
US8135221B2 (en) * 2009-10-07 2012-03-13 Eastman Kodak Company Video concept classification using audio-visual atoms
CN102663409A (zh) * 2012-02-28 2012-09-12 西安电子科技大学 一种基于hog-lbp描述的行人跟踪方法
CN102682302A (zh) * 2012-03-12 2012-09-19 浙江工业大学 一种基于关键帧的多特征融合的人体姿态识别方法
CN103164694A (zh) * 2013-02-20 2013-06-19 上海交通大学 一种人体动作识别的方法

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6965645B2 (en) * 2001-09-25 2005-11-15 Microsoft Corporation Content-based characterization of video frame sequences
KR100876280B1 (ko) * 2001-12-31 2008-12-26 주식회사 케이티 통계적 모양기술자 추출 장치 및 그 방법과 이를 이용한 동영상 색인 시스템
US7558809B2 (en) * 2006-01-06 2009-07-07 Mitsubishi Electric Research Laboratories, Inc. Task specific audio classification for identifying video highlights
JP5553152B2 (ja) * 2010-04-09 2014-07-16 ソニー株式会社 画像処理装置および方法、並びにプログラム
US8923607B1 (en) * 2010-12-08 2014-12-30 Google Inc. Learning sports highlights using event detection
US8699852B2 (en) * 2011-10-10 2014-04-15 Intellectual Ventures Fund 83 Llc Video concept classification using video similarity scores
US8867891B2 (en) * 2011-10-10 2014-10-21 Intellectual Ventures Fund 83 Llc Video concept classification using audio-visual grouplets
CN103177091B (zh) * 2013-03-08 2016-02-10 深圳先进技术研究院 视频分类方法和系统
US9213903B1 (en) * 2014-07-07 2015-12-15 Google Inc. Method and system for cluster-based video monitoring and event categorization
US9420331B2 (en) * 2014-07-07 2016-08-16 Google Inc. Method and system for categorizing detected motion events

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070255755A1 (en) * 2006-05-01 2007-11-01 Yahoo! Inc. Video search engine using joint categorization of video clips and queries based on multiple modalities
US8135221B2 (en) * 2009-10-07 2012-03-13 Eastman Kodak Company Video concept classification using audio-visual atoms
CN101894276A (zh) * 2010-06-01 2010-11-24 中国科学院计算技术研究所 人体动作识别的训练方法和识别方法
CN102034096A (zh) * 2010-12-08 2011-04-27 中国科学院自动化研究所 基于自顶向下运动注意机制的视频事件识别方法
CN102663409A (zh) * 2012-02-28 2012-09-12 西安电子科技大学 一种基于hog-lbp描述的行人跟踪方法
CN102682302A (zh) * 2012-03-12 2012-09-19 浙江工业大学 一种基于关键帧的多特征融合的人体姿态识别方法
CN103164694A (zh) * 2013-02-20 2013-06-19 上海交通大学 一种人体动作识别的方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3067831A4 *

Also Published As

Publication number Publication date
CN104679779B (zh) 2019-02-01
CN104679779A (zh) 2015-06-03
US10002296B2 (en) 2018-06-19
US20160275355A1 (en) 2016-09-22
EP3067831A1 (en) 2016-09-14
EP3067831A4 (en) 2016-12-07

Similar Documents

Publication Publication Date Title
CN109977262B (zh) 从视频中获取候选片段的方法、装置及处理设备
WO2015078134A1 (zh) 视频分类的方法和装置
Zhang et al. Refineface: Refinement neural network for high performance face detection
Richard et al. Temporal action detection using a statistical language model
Meng et al. Object co-segmentation based on shortest path algorithm and saliency model
JP5953151B2 (ja) 学習装置、及びプログラム
Wang et al. Mining motion atoms and phrases for complex action recognition
Chen et al. Video object segmentation via dense trajectories
Du et al. Online deformable object tracking based on structure-aware hyper-graph
Choi et al. A spatio-temporal pyramid matching for video retrieval
JP2011221791A (ja) 顔クラスタリング装置、顔クラスタリング方法、及びプログラム
JP2008203933A (ja) カテゴリ作成方法および装置、文書分類方法および装置
Duan et al. Video shot boundary detection based on feature fusion and clustering technique
Wang et al. Real-time summarization of user-generated videos based on semantic recognition
Alamuru et al. Video event detection, classification and retrieval using ensemble feature selection
Mohamadzadeh et al. Content based video retrieval based on hdwt and sparse representation
Liu et al. Global for coarse and part for fine: A hierarchical action recognition framework
Zhou et al. Modeling perspective effects in photographic composition
Priya et al. A comprehensive review of significant researches on content based indexing and retrieval of visual information
Liu et al. Fusing audio-words with visual features for pornographic video detection
Sidiropoulos et al. Video tomographs and a base detector selection strategy for improving large-scale video concept detection
Wang et al. Detecting action-relevant regions for action recognition using a three-stage saliency detection technique
Sowmyayani et al. STHARNet: Spatio-temporal human action recognition network in content based video retrieval
JP2014146207A (ja) コンテンツをバイナリ特徴ベクトルの集合で表現することによって高速に検索する検索装置、プログラム及び方法
Gao et al. Cast2face: assigning character names onto faces in movie with actor-character correspondence

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14866346

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

REEP Request for entry into the european phase

Ref document number: 2014866346

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2014866346

Country of ref document: EP