WO2017166597A1 - 一种卡通视频识别方法、装置和电子设备 - Google Patents

一种卡通视频识别方法、装置和电子设备 Download PDF

Info

Publication number
WO2017166597A1
WO2017166597A1 PCT/CN2016/096153 CN2016096153W WO2017166597A1 WO 2017166597 A1 WO2017166597 A1 WO 2017166597A1 CN 2016096153 W CN2016096153 W CN 2016096153W WO 2017166597 A1 WO2017166597 A1 WO 2017166597A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
pixel
quantization interval
histogram
interval
Prior art date
Application number
PCT/CN2016/096153
Other languages
English (en)
French (fr)
Inventor
杨帆
白茂生
魏伟
蔡砚刚
刘阳
Original Assignee
乐视控股(北京)有限公司
乐视云计算有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 乐视控股(北京)有限公司, 乐视云计算有限公司 filed Critical 乐视控股(北京)有限公司
Publication of WO2017166597A1 publication Critical patent/WO2017166597A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Definitions

  • the invention relates to the field of pattern recognition technology, in particular to a cartoon video recognition method and device.
  • cartoon video has more obvious edge features.
  • the color of cartoon video is more abundant.
  • cartoon video recognition methods such as categorizing video types by statistical video color, texture, shape, motion, etc., wherein "recognition” is actually using pre-trained classification.
  • the device classifies a specific set of image features.
  • the classifier since the extraction of image features is not comprehensive, the classifier also has inevitable deviations, so the recognition results are somewhat inaccurate.
  • the object of the present invention is to provide a cartoon video recognition method, device and electronic device, which can further improve the accuracy of cartoon video recognition.
  • a cartoon video recognition method comprising:
  • the second classification algorithm is used to determine whether the video to be identified is a cartoon video according to the interval distribution.
  • the image feature may include a color histogram, an edge histogram, a highlight pixel ratio, an edge pixel ratio, and color moment information, wherein the color moment information is calculated by a color histogram; the edge histogram passes the statistical direction of the pixel point and The gradient amplitude is worth; the highlight pixel ratio refers to the ratio of the pixel points in the HSV (Hue-Saturation-Value) space where the V (Value) parameter is greater than the threshold X; the edge pixel ratio refers to the gradient. The ratio of pixels whose amplitude is greater than the threshold Y.
  • the calculation method of the edge histogram expressed in a matrix form is:
  • B mn represents an edge histogram matrix of m rows and n columns
  • L m represents a gradient magnitude histogram matrix of m rows and 1 column
  • G n represents a gradient direction histogram matrix of 1 row and n columns
  • N represents the total number of statistical pixels
  • Q nj is a matrix of 1 row and n columns, indicating the contribution of the jth pixel to each gradient direction interval
  • the calculation method of Q nj is as follows: if the gradient direction of the jth pixel falls within the quantization interval S, it contributes to the quantization interval S and the quantization interval T, and does not contribute to other quantization intervals, the quantization interval in addition to the quantization interval T is S and the j-th pixel gradient directions angle smallest quantization interval; j-th pixel contributions S and V S is the contribution of the quantization interval of the quantization interval T V T is calculated as follows:
  • ⁇ ST represents the angle between the direction represented by the midpoint of the quantization interval S and the direction represented by the midpoint of the quantization interval T
  • ⁇ S represents the gradient direction of the jth pixel point and the midpoint of the quantization interval S
  • ⁇ T represents the angle between the gradient direction of the jth pixel point and the direction represented by the midpoint of the quantization interval T.
  • the color histogram is obtained by linearly quantizing the color in the HSV space, and the color moment information is composed of the first three moments of the color histogram, that is, the first moment (mean Mean) and the second moment (variance Variance) And the third moment (skewness).
  • the section division of the gradient direction in the range of 0 to 180 degrees is mirror-symmetric with the section division in the range of 180 to 360 degrees.
  • the key frames are all from the valid segment of the video to be identified, and the effective segment is the remaining portion of the video to be identified after the beginning portion and the ending portion are removed, and the duration of the effective segment is at least 50% of the duration of the entire video to be identified. And the beginning and end of the duration are at least 8% of the total length of the video to be identified.
  • the advantage of this is: generally, both cartoon video and non-cartoon video will have a title and a trailer, and the title and trailer may be subtitles, which will affect the recognition result of the algorithm. It is best to remove the beginning and end of the video.
  • the image features are all from the effective area of the key frame, the area of the effective area is at least 25% of the entire key frame area, and the effective area is located in the desirable area of the key frame; the desirable area is a common geometric center with the key frame.
  • the keyframe is similar in shape, and the area of the desirable area is 64% of the key frame area.
  • a cartoon video recognition device comprising:
  • Key frame extraction module for extracting key frames from the to-be-identified video
  • Image feature extraction module for acquiring image features from key frames
  • a first classification module configured to calculate a cartoon image membership degree of the key frame according to image features of each key frame using a first classification algorithm
  • Membership degree distribution statistics module used to divide the value range of the cartoon image membership degree into at least three intervals, and count the interval distribution of the cartoon image membership degree of all key frames;
  • the second classification module is configured to determine, by using the second classification algorithm, whether the video to be identified is a cartoon video according to the interval distribution.
  • the image features may include a color histogram, an edge histogram, a highlight pixel ratio, an edge pixel ratio, and color moment information obtained from a color histogram; the edge histogram is derived by counting the gradient direction and the gradient width of the pixel point.
  • the highlight pixel ratio refers to the ratio of the pixel points in the HSV space where the V parameter is greater than the threshold X; the edge pixel ratio refers to the ratio of the pixel points whose gradient magnitude is greater than the threshold Y.
  • the calculation method of the edge histogram expressed in a matrix form is:
  • B mn represents an edge histogram matrix of m rows and n columns
  • L m represents a gradient magnitude histogram matrix of m rows and 1 column
  • G n represents a gradient direction histogram matrix of 1 row and n columns
  • N represents the total number of statistical pixels
  • Q nj is a matrix of 1 row and n columns, indicating the contribution of the jth pixel to each gradient direction interval
  • the calculation method of Q nj is as follows: if the gradient direction of the jth pixel falls within the quantization interval S, it contributes to the quantization interval S and the quantization interval T, and does not contribute to other quantization intervals, the quantization interval in addition to the quantization interval T is S and the j-th pixel gradient directions angle smallest quantization interval; j-th pixel contributions S and V S is the contribution of the quantization interval of the quantization interval T V T is calculated as follows:
  • ⁇ ST represents the angle between the direction represented by the midpoint of the quantization interval S and the direction represented by the midpoint of the quantization interval T
  • ⁇ S represents the gradient direction of the jth pixel point and the midpoint of the quantization interval S
  • ⁇ T represents the angle between the gradient direction of the jth pixel point and the direction represented by the midpoint of the quantization interval T.
  • the color histogram is obtained by linearly quantizing the color in the HSV space, and the color moment information is composed of the first three moments of the color histogram, that is, the first moment (mean Mean) and the second moment (variance Variance) And the third moment (skewness).
  • the section division of the gradient direction in the range of 0 to 180 degrees is mirror-symmetric with the section division in the range of 180 to 360 degrees.
  • the key frames are all from the valid segment of the video to be identified, and the effective segment is the remaining portion of the video to be identified after the beginning portion and the ending portion are removed, and the duration of the effective segment is at least 50% of the duration of the entire video to be identified. And the beginning and end of the duration are at least 8% of the total length of the video to be identified.
  • the advantage of this is: generally, both cartoon video and non-cartoon video will have a title and a trailer, and the title and the end may be subtitles, which will affect the recognition result of the algorithm, so it is best to remove the beginning and end of the video.
  • the image features are all from the effective area of the key frame, the area of the effective area is at least 25% of the entire key frame area, and the effective area is located in the desirable area of the key frame;
  • the key frames have similar shapes of key frames of a common geometric center, and the area of the desirable area is 64% of the key frame area.
  • the advantage of this is that both cartoon video and non-cartoon video may have black frames and/or subtitles at the edge of the video. In order to avoid interference between black frames and subtitles, it is better to select the geometric middle portion of the video keyframes for recognition.
  • the present invention also discloses an electronic device comprising at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, The instructions are executed by the at least one processor to enable the at least one processor to extract key frames from the to-be-identified video; to acquire image features from the key frames; to use image characteristics of each key frame using a first classification algorithm Calculating the membership degree of the cartoon image of the key frame; dividing the value range of the membership degree of the cartoon image into at least three intervals, and counting the interval distribution of the membership degree of the cartoon image of all the key frames; using the second classification algorithm according to the The interval distribution determines whether the video to be identified is a cartoon video.
  • the image feature includes a color histogram, an edge histogram, a highlight pixel ratio, an edge pixel ratio, and color moment information, wherein the color moment information is calculated by the color histogram;
  • the edge histogram It is obtained by counting the gradient direction and the gradient width of the pixel point;
  • the ratio of the highlighted pixel refers to the ratio of the pixel point in the HSV space whose V parameter is greater than the threshold X;
  • the ratio of the edge pixel refers to the pixel whose gradient amplitude is greater than the threshold Y
  • the proportion of points is provided.
  • the gradient magnitude histogram matrix of the column, G n represents the gradient direction histogram matrix of 1 row and n columns;
  • the calculation method of the gradient direction histogram matrix is: Where: N represents the total number of statistical pixels; Q nj is a matrix of 1 row and n columns, indicating the contribution of the jth pixel to each gradient direction interval; Q nj is calculated as: the jth pixel If the gradient direction of the point falls within the quantization interval S, it contributes to both the quantization interval S and the quantization interval T, and does not contribute to other quantization intervals, the quantization interval T is the same as the quantization interval S and the j-th pixel the smallest angle between the direction of gradient quantization intervals; S V and the contribution of the j-th
  • the color histogram is obtained by linearly quantizing the color in the HSV space, and the color moment information is composed of the first three moments of the color histogram.
  • the section of the gradient direction in the range of 0 to 180 degrees is mirror-symmetrical with the section of the range of 180 to 360 degrees.
  • the key frames are all from the valid segment of the to-be-identified video, and the valid segment is the remaining portion after the video to be recognized is removed from the beginning portion and the ending portion, and the effective segment has a duration of at least It is 50% of the duration of the entire video to be identified, and the duration of the beginning portion and the ending portion are at least 8% of the duration of the entire video to be identified.
  • the image features are all from an effective area of a key frame, the effective area has an area of at least 25% of an entire key frame area, and the effective area is located in a desirable area of the key frame;
  • the present invention also discloses a non-volatile computer storage medium, characterized in that the storage medium stores the computer-executable instructions of computer-executable instructions when executed by an electronic device Enabling the electronic device to: extract key frames from the to-be-identified video; acquire image features from the key frames; calculate a cartoon image membership degree of the key frames according to image features of each key frame using a first classification algorithm; The value range of the image membership degree is divided into at least three intervals, and the interval distribution of the cartoon image membership degree of all the key frames is counted; and the second classification algorithm is used to determine whether the to-be-identified video is a cartoon video according to the interval distribution.
  • the image feature includes a color histogram, an edge histogram, a highlight pixel ratio, an edge pixel ratio, and color moment information, wherein the color moment information is calculated by the color histogram; the edge histogram It is obtained by counting the gradient direction and the gradient width of the pixel point; the ratio of the highlighted pixel refers to the ratio of the pixel point in the HSV space whose V parameter is greater than the threshold X; the ratio of the edge pixel refers to the pixel whose gradient amplitude is greater than the threshold Y The proportion of points.
  • the gradient magnitude histogram matrix of the column, G n represents the gradient direction histogram matrix of 1 row and n columns;
  • the calculation method of the gradient direction histogram matrix is: Where: N represents the total number of statistical pixels; Q nj is a matrix of 1 row and n columns, indicating the contribution of the jth pixel to each gradient direction interval; Q nj is calculated as: the jth pixel If the gradient direction of the point falls within the quantization interval S, it contributes to both the quantization interval S and the quantization interval T, and does not contribute to other quantization intervals, the quantization interval T is the same as the quantization interval S and the j-th pixel the smallest angle between the direction of gradient quantization intervals; S V and the contribution of the j-th
  • the color histogram is obtained by linearly quantizing the color in the HSV space, and the color moment information is composed of the first three moments of the color histogram.
  • the section in which the gradient direction is in the range of 0 to 180 degrees is mirror-symmetrical to the section in the range of 180 to 360 degrees.
  • the key frames are all from the valid segment of the to-be-identified video, and the valid segment is the remaining portion after the video to be recognized is removed from the beginning portion and the ending portion, and the effective segment has a duration of at least It is 50% of the duration of the entire video to be identified, and the duration of the beginning portion and the ending portion are at least 8% of the duration of the entire video to be identified.
  • the image features are all from an effective area of a key frame, the area of the effective area is at least 25% of the entire key frame area, and the effective area is located in a desirable area of the key frame;
  • Embodiments of the present invention also provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer
  • the computer is caused to perform the method of any of the above.
  • the invention divides the membership degree of the cartoon image into a plurality of intervals, and performs statistics on the distribution of the membership degree of the cartoon images of all the key frames, and then inputs the statistical result into the second classifier for re-classification, thereby improving the cartoon video.
  • the accuracy of the judgment, while still maintaining a low algorithm complexity, is an important improvement to the prior art.
  • FIG. 1 is a flow chart of an embodiment of a method of the present invention
  • FIG. 2 is a schematic structural view of an embodiment of a device according to the present invention.
  • FIG. 3 is a schematic diagram of a method for dividing an angle interval according to an embodiment of the present invention
  • FIG. 4 is a schematic structural diagram of hardware of an electronic device according to an embodiment of the present invention.
  • connection or integral connection; may be mechanical connection or electrical connection; may be directly connected, may also be indirectly connected through an intermediate medium, or may be internal communication of two components, may be wireless connection, or may be wired connection.
  • connection or integral connection; may be mechanical connection or electrical connection; may be directly connected, may also be indirectly connected through an intermediate medium, or may be internal communication of two components, may be wireless connection, or may be wired connection.
  • a cartoon video recognition method comprising:
  • Extract keyframes from the video to be identified, and keyframes can be extracted using ffmpeg and other software;
  • the second classification algorithm is used to determine whether the video to be identified is a cartoon video according to the interval distribution.
  • the image features include a color histogram, an edge histogram, a highlight pixel ratio, an edge pixel ratio, and color moment information obtained from the color histogram;
  • the edge histogram is derived by counting the gradient direction and the gradient width of the pixel.
  • the Sobel operator can be used;
  • the ratio of the highlighted pixel refers to the ratio of the pixel points in the HSV (Hue-Saturation-Value) space where the V (Value) parameter is greater than the threshold X;
  • the edge pixel ratio It is the ratio of the pixel points whose gradient amplitude is greater than the threshold Y.
  • the calculation method of the edge histogram expressed in a matrix form is:
  • B mn represents an edge histogram matrix of m rows and n columns
  • L m represents a gradient magnitude histogram matrix of m rows and 1 column
  • G n represents a gradient direction histogram matrix of 1 row and n columns
  • N represents the total number of statistical pixels
  • Q nj is a matrix of 1 row and n columns, indicating the contribution of the jth pixel to each gradient direction interval
  • the calculation method of Q nj is as follows: if the gradient direction of the jth pixel falls within the quantization interval S, it contributes to the quantization interval S and the quantization interval T, and does not contribute to other quantization intervals, the quantization interval in addition to the quantization interval T is S and the j-th pixel gradient directions angle smallest quantization interval; j-th pixel contributions S and V S is the contribution of the quantization interval of the quantization interval T V T is calculated as follows:
  • ⁇ ST represents the minimum positive angle of the direction characterized by the midpoint of the quantization interval S and the direction represented by the midpoint of the quantization interval T
  • ⁇ S represents the gradient direction of the jth pixel point and the quantization interval S
  • ⁇ T represents the minimum positive angle of the direction of the gradient of the jth pixel point and the direction represented by the midpoint of the quantization interval T.
  • an angle interval division manner is in which each two adjacent solid lines is an angle interval, and the two broken lines in the figure respectively represent angle bisectors of the first angle interval and the second angle interval
  • the dotted line in the figure indicates the gradient direction of a certain pixel point, which falls within the second angle interval, and also falls within the range of the angle between the two broken lines, so the pixel point is opposite to the first angle interval and the second angle.
  • the interval contributes, and its contribution to the first angle interval is And the contribution to the second angle interval is
  • the color histogram is obtained by linearly quantizing the color in the HSV space, and the color moment information is composed of the first three moments of the color histogram, that is, the first moment (mean Mean) and the second moment (variance Variance) And the third moment (skewness), which are calculated as:
  • N represents the total number of pixels
  • i represents the image channel (ie, H channel, S channel or V channel in HSV space)
  • p ij represents the gray value of the jth pixel of the key frame under channel i.
  • the interval division of the gradient direction in the range of 0 to 180 degrees is mirror-symmetric with the section division in the range of 180 to 360 degrees, for example, the circle is uniformly divided into 8 sections from 0 degrees.
  • the key frames are all from the valid segment of the video to be identified, and the effective segment is the remaining portion of the video to be identified after the beginning portion and the ending portion are removed, and the duration of the effective segment is at least 50% of the duration of the entire video to be identified. And the beginning and end of the duration are at least 8% of the total length of the video to be identified.
  • the advantage of this is: generally, both cartoon video and non-cartoon video will have a title and a trailer, and the title and the end may be subtitles, which will affect the recognition result of the algorithm, so it is best to remove the beginning and end of the video.
  • the image features are all from the effective area of the key frame, the area of the effective area is at least 25% of the entire key frame area, and the effective area is located in the desirable area of the key frame; the desirable area is a common geometric center with the key frame.
  • the keyframe is similar in shape, and the area of the desirable area is 64% of the key frame area.
  • a cartoon video recognition device comprising:
  • Key frame extraction module for extracting key frames from the to-be-identified video
  • Image feature extraction module for acquiring image features from key frames
  • a first classification module configured to calculate a cartoon image membership degree of the key frame according to image features of each key frame using a first classification algorithm
  • Membership degree distribution statistics module used to divide the value range of the cartoon image membership degree into at least three intervals, and count the interval distribution of the cartoon image membership degree of all key frames;
  • the second classification module uses the second classification algorithm to determine whether the video to be identified is a cartoon video according to the interval distribution.
  • the image features include a color histogram, an edge histogram, a highlight pixel ratio, an edge pixel ratio, and color moment information obtained from the color histogram;
  • the edge histogram is derived by counting the gradient direction and the gradient width of the pixel;
  • the highlight pixel ratio refers to the ratio of the pixel points in the HSV space where the V parameter is greater than the threshold X;
  • the edge pixel ratio refers to the ratio of the pixel points whose gradient magnitude is greater than the threshold Y.
  • the calculation method of the edge histogram expressed in a matrix form is:
  • B mn represents an edge histogram matrix of m rows and n columns
  • L m represents a gradient magnitude histogram matrix of m rows and 1 column
  • G n represents a gradient direction histogram matrix of 1 row and n columns
  • N represents the total number of statistical pixels
  • Q nj is a matrix of 1 row and n columns, indicating the contribution of the jth pixel to each gradient direction interval
  • the calculation method of Q nj is as follows: if the gradient direction of the jth pixel falls within the quantization interval S, it contributes to the quantization interval S and the quantization interval T, and does not contribute to other quantization intervals, the quantization interval in addition to the quantization interval T is S and the j-th pixel gradient directions angle smallest quantization interval; j-th pixel contributions S and V S is the contribution of the quantization interval of the quantization interval T V T is calculated as follows:
  • ⁇ ST represents the minimum positive angle of the direction characterized by the midpoint of the quantization interval S and the direction represented by the midpoint of the quantization interval T
  • ⁇ S represents the gradient direction of the jth pixel point and the quantization interval S
  • ⁇ T represents the minimum positive angle of the direction of the gradient of the jth pixel point and the direction represented by the midpoint of the quantization interval T.
  • the color histogram is obtained by linearly quantizing the color in the HSV space, and the color moment information is composed of the first three moments of the color histogram, that is, the first moment (mean Mean), the second moment (variance Variance) and third moment (skewness).
  • the section division of the gradient direction in the range of 0 to 180 degrees is mirror-symmetric with the section division in the range of 180 to 360 degrees.
  • the key frames are all from the valid segment of the video to be identified, and the effective segment is the remaining portion of the video to be identified after the beginning portion and the ending portion are removed, and the duration of the effective segment is at least 50% of the duration of the entire video to be identified. And the beginning and end of the duration are at least 8% of the total length of the video to be identified.
  • the advantage of this is: generally, both cartoon video and non-cartoon video will have a title and a trailer, and the title and the end may be subtitles, which will affect the recognition result of the algorithm, so it is best to remove the beginning and end of the video.
  • the image features are all from the effective area of the key frame, the area of the effective area is at least 25% of the entire key frame area, and the effective area is located in the desirable area of the key frame; the desirable area is a common geometric center with the key frame.
  • the keyframe is similar in shape, and the area of the desirable area is 64% of the key frame area.
  • a cartoon video recognition method includes the following steps:
  • Step 101 intercepting an intermediate 80% duration portion of the video to be identified
  • Step 102 Extract a key frame from the intercepted portion
  • Step 103 intercepting an area of 70% of the length and width of the key frame
  • Step 104 Convert an image of the intercepted area to an HSV space
  • Step 105 the color histogram of the intercepted area is statistically calculated in the HSV space, and the H parameter is evenly divided into 8 sections, and the S and V parameters are uniformly divided into 6 sections respectively; and the edge histogram of the intercepted area is statistically calculated, wherein the edge is histogram
  • the graph is obtained by counting the gradient direction and the gradient amplitude of the pixel points, and the gradient direction and the gradient amplitude are uniformly divided into 8 intervals;
  • Step 106 according to the color histogram and the edge histogram, the ratio of the highlight pixel and the edge pixel of the intercepted region are statistically; wherein the ratio of the highlighted pixel is the ratio of the pixel with the V parameter greater than 0.5, and the ratio of the edge pixel is the gradient amplitude greater than the threshold of 0.087.
  • Step 107 classify the intercepted area by using a previously trained SVM (Support Vector Machine) classifier, and calculate a cartoon image membership degree of the intercepted area;
  • SVM Small Vector Machine
  • Step 108 The range of the membership degree is evenly divided into five intervals, and all key frames are counted. Membership degree distribution;
  • Step 109 Enter the membership degree distribution into another pre-trained SVM classifier to calculate the probability that the video is a cartoon video, and if the probability exceeds 50%, determine that it is a cartoon video;
  • step 110 a determination result is output.
  • an apparatus includes a key frame extraction module 201, an image feature extraction module 202, a first classification module 203, and a membership distribution statistics module 204.
  • the second classification module 205; the membership degree distribution statistic module 204 is configured to divide the value range of the membership degree of the cartoon image into five sections, and calculate the interval distribution of the membership degree of the cartoon image, and then use the second classification algorithm according to the interval distribution. Perform classification to determine whether the video to be identified is a cartoon video.
  • the video to be recognized is input to the key frame extraction module 201, and the key frame extraction module 201 passes the extracted key frame to the image feature extraction module 202, and the image feature extraction module 202 extracts image features from the key frame, and the extraction method can be Using any of the methods mentioned in the method of the present invention, the image feature extraction module 202 then passes the image features to the first classification module 203, and the first classification module 203 calculates the cartoon image membership of each key frame, and the result
  • the membership degree distribution statistics module 204 transmits the statistical result to the second classification module 205.
  • the classification calculation of the second classification module 205 determines whether the video to be classified is a cartoon video, and finally outputs the video type.
  • the specific implementation of the device of the present invention can be either a dedicated device or a device formed by installing specific software on a smart device such as a computer, a mobile phone, or a tablet.
  • each parameter that defines the range is selected, and it is preferable to select any endpoint value or intermediate value within the range, and different combinations of values of the parameters are also feasible.
  • those skilled in the art can perform specific values for each of the parameters without any creative labor, and the obtained application effects are not exceeded in the present invention.
  • the scope of the invention therefore, in order to save space, the inventors no longer list all possible values and their possible combinations.
  • the apparatus of the foregoing embodiment is used to implement the corresponding method in the foregoing embodiments, and has the beneficial effects of the corresponding method embodiments, and details are not described herein again.
  • an embodiment of the present invention further discloses an electronic device including at least one processor 810; and a memory 800 communicably connected to the at least one processor 810; wherein the memory 800 stores An instruction executed by the at least one processor 810, the instructions being executed by the at least one processor 810 to enable the at least one processor 810 to extract key frames from the to-be-identified video; to obtain an image from the key frame Feature; using a first classification algorithm to calculate a cartoon image membership degree of the key frame according to image features of each key frame; dividing the value range of the cartoon image membership degree into at least three intervals, and counting cartoon images of all key frames The interval distribution of the membership degree; using the second classification algorithm to determine whether the video to be identified is a cartoon video according to the interval distribution.
  • the electronic device also includes an input device 830 and an output device 840 that are electrically coupled to the memory 800 and the processor, the electrical connections preferably being connected by a bus.
  • the image feature includes a color histogram, an edge histogram, a highlight pixel ratio, an edge pixel ratio, and color moment information, wherein the color moment information is calculated by the color histogram;
  • the edge histogram is obtained by counting the gradient direction and the gradient width of the pixel point;
  • the ratio of the highlighted pixel refers to the ratio of the pixel point in the HSV space whose V parameter is greater than the threshold X;
  • the edge pixel ratio refers to the gradient amplitude The ratio of pixel points larger than the threshold Y.
  • the calculation method of the gradient direction histogram matrix is: Where: N represents the total number of statistical pixels; Q nj is a matrix of 1 row and n columns, indicating the contribution of the jth pixel to each gradient direction interval; Q nj is calculated as: the jth pixel If the gradient direction of the point falls within the quantization interval S, it contributes to both the quantization interval S and the quantization interval T, and does not contribute to other quantization intervals, the quantization interval T is the same as the quantization interval S and the j-th pixel the smallest angle between the direction of gradient quantization intervals; S V and the
  • the color histogram is obtained by linearly quantizing the color in the HSV space, and the color moment information is composed of the first three moments of the color histogram.
  • the section in which the gradient direction is in the range of 0 to 180 degrees is mirror-symmetrical with the section in the range of 180 to 360 degrees.
  • the key frames are all from a valid segment of the to-be-identified video, and the valid segment is a remaining portion after the video to be recognized is removed from the beginning portion and the ending portion, and the effective portion is valid.
  • the duration of the segment is at least 50% of the duration of the entire video to be identified, and the duration of the beginning portion and the ending portion are at least 8% of the duration of the entire video to be identified.
  • the image features are all valid from key frames.
  • the area, the area of the effective area is at least 25% of the entire key frame area, and the effective area is located in a desirable area of the key frame; the desirable area is a similar shape of a key frame having a common geometric center with the key frame, and is preferably The area of the area is 64% of the key frame area.
  • the present invention also discloses a non-volatile computer storage medium, characterized in that the storage medium stores the computer-executable instructions of computer-executable instructions that, when executed by an electronic device, enable the electronic device to: Extracting key frames from the recognition video; acquiring image features from the key frames; calculating a cartoon image membership degree of the key frames according to image features of each key frame using a first classification algorithm; dividing the value range of the cartoon image membership degrees into At least three intervals, and counting the interval distribution of the membership degree of the cartoon images of all the key frames; using the second classification algorithm to determine whether the video to be identified is a cartoon video according to the interval distribution.
  • the image feature includes a color histogram, an edge histogram, a highlight pixel ratio, an edge pixel ratio, and color moment information, wherein the color moment information is calculated by the color histogram;
  • the edge histogram is obtained by counting the gradient direction and the gradient width of the pixel point;
  • the ratio of the highlighted pixel refers to the ratio of the pixel point in the HSV space whose V parameter is greater than the threshold X;
  • the edge pixel ratio refers to the gradient amplitude The ratio of pixel points larger than the threshold Y.
  • the calculation method of the gradient direction histogram matrix is: Where: N represents the total number of statistical pixels; Q nj is a matrix of 1 row and n columns, indicating the contribution of the jth pixel to each gradient direction interval; Q nj is calculated as: the jth pixel If the gradient direction of the point falls within the quantization interval S, it contributes to both the quantization interval S and the quantization interval T, and does not contribute to other quantization intervals, the quantization interval T is the same as the quantization interval S and the j-th pixel the smallest angle between the direction of gradient quantization intervals; S and V contribution
  • the color histogram is obtained by linearly quantizing the color in the HSV space, and the color moment information is composed of the first three moments of the color histogram.
  • the section of the gradient direction in the range of 0 to 180 degrees is mirror-symmetrical with the section of the range of 180 to 360 degrees.
  • the key frames are all from a valid section of the to-be-identified video, and the valid section is a remaining part after the video to be recognized is removed from the beginning part and the ending part, and the valid part is valid.
  • the duration of the segment is at least 50% of the duration of the entire video to be identified, and the duration of the beginning portion and the ending portion are at least 8% of the duration of the entire video to be identified.
  • the image features are all from an effective area of a key frame, the effective area has an area of at least 25% of an entire key frame area, and the effective area is located in a desirable area of the key frame;
  • the desirable area is a similar shape of a key frame having a common geometric center with a key frame, and the area of the desirable area is 64% of the key frame area.
  • Embodiments of the present invention also provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer
  • the computer is caused to perform the method described in the above embodiments.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. Instructions are provided for implementation in the flowchart The steps of a process or a plurality of processes and/or block diagrams of a function specified in a block or blocks.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种卡通视频识别方法、装置和电子设备,属于模式识别技术领域。该方法包含从待识别视频中提取关键帧、从关键帧中获取图像特征、根据每个关键帧的图像特征计算该关键帧的卡通图像隶属度,以及根据待识别视频的所有关键帧的卡通图像隶属度判断该视频是否为卡通视频等步骤。本发明具有算法简单、图像特征选取合理的特点,尤其是在识别速度和识别准确率之间达到了一个恰当的平衡,非常适合大量视频的识别场景。

Description

一种卡通视频识别方法、装置和电子设备
交叉引用
本申请要求在2016年03月31日提交中国专利局、申请号为201610201081.0、发明名称为“一种卡通视频识别方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及模式识别技术领域,特别是指一种卡通视频识别方法及装置。
背景技术
目前,随着网络技术和多媒体技术的高速发展,各种多媒体信息日渐庞博,大大丰富了人们的日常生活。同时,视频作为一种常见的多媒体形式,与人类的日常生活息息相关,也是网络上访问较多的一种资源模式。
根据中国互联网中心第34次中国互联网络发展状况统计报告,截止2014年6月底中国已经拥有高达4.39亿的互联网视频用户,网络视频的用户数量占到了中国人口总数的1/3。随着用户数量的增多,用户对于在线视频的需求也越来越大。为了充分的满足用户的需求,各大门户网站不断扩充在线视频库,导致互联网视频数量的急剧增加,单个门户网站的视频数量可以达到数百亿个之多。据知名调研机构ComScore的调研数据,2011年10月优酷网的视频播放量达到46亿次,日上传量为7万。
但是,由于视频的种类和数量与日俱增,所以如何对这些海量的视频进行分类整理,以使人们能够根据类别快速找到自己感兴趣的内容便成为了一个重要的课题。为此,视频的自动分析系统便应运而生,视频的自动分类检测算法也成为模式识别领域的一个研究热点。
从乐视网的视频分类标签中可以看到,常见的视频类型主要有电影、电视剧、体育、动漫等等类别,其中动漫也就是卡通视频。在这些视频种类中,卡通视频是一种特殊的类别,它不同于其他视频的“真实性”,而是通过手工或电脑绘画制作出来的视频。目前,动漫产业在国内外都已成为一个重要的文化产业,因此动漫视频所占的比重也将日益庞大。因此,卡通视频识 别就成为视频分类领域的一个重要研究方向。
卡通视频的一个重要特点是,卡通视频具有较为明显的边缘特征,同时,卡通视频的色彩也更加丰富。基于这些特征,现有技术中已有一些卡通视频识别方法,比如通过统计视频的颜色、纹理、形状、运动等等特征来对视频种类进行识别,其中“识别”其实就是使用事先训练好的分类器对某一组具体的图像特征进行分类。但是,由于对图像特征的提取不可能全面,分类器也存在不可避免的偏差,因此识别结果存在一定地不准确性。
发明内容
有鉴于此,本发明的目的在于提出一种卡通视频识别方法、装置和电子设备,能够进一步提高卡通视频识别的准确率。
基于上述目的,本发明提供的技术方案为:
一种卡通视频识别方法,该方法包含:
从待识别视频中提取关键帧;
从关键帧中获取图像特征;
使用第一分类算法根据每个关键帧的图像特征计算该关键帧的卡通图像隶属度;
将卡通图像隶属度的取值范围分成至少三个区间,并统计所有关键帧的卡通图像隶属度的区间分布情况;
使用第二分类算法根据所述区间分布情况判断待识别视频是否为卡通视频。
具体地,图像特征可以包含颜色直方图、边缘直方图、高亮像素比例、边缘像素比例和颜色矩信息,其中颜色矩信息由颜色直方图计算得到;边缘直方图通过统计像素点的梯度方向和梯度幅值得出;高亮像素比例是指HSV(Hue-Saturation-Value,色调-饱和度-明度)空间中V(Value,明度)参数大于阈值X的像素点的比例;边缘像素比例是指梯度幅值大于阈值Y的像素点的比例。
具体地,边缘直方图以矩阵形式表示的计算方式为:
Bmn=Lm×Gn
式中,Bmn表示m行n列的边缘直方图矩阵,Lm表示m行1列的梯度幅值直方图矩阵,Gn表示1行n列的梯度方向直方图矩阵;
上述梯度方向直方图矩阵的计算方式为:
Figure PCTCN2016096153-appb-000001
式中:N表示被统计像素点的总数;Qnj为一个1行n列的矩阵,表示第j个像素点对每一个梯度方向区间的贡献;
上述Qnj的计算方式为:设第j个像素点的梯度方向落入量化区间S,则它对量化区间S以及量化区间T均有贡献,而对其他量化区间均无贡献,所述量化区间T是除量化区间S外与第j个像素点的梯度方向夹角最小的量化区间;第j个像素点对量化区间S的贡献vS和对量化区间T的贡献vT按下式计算:
Figure PCTCN2016096153-appb-000002
Figure PCTCN2016096153-appb-000003
式中,γST表示量化区间S的中点所表征的方向与量化区间T的中点所表征的方向的夹角,θS表示第j个像素点的梯度方向与量化区间S的中点所表征的方向的夹角,θT表示第j个像素点的梯度方向与量化区间T的中点所表征的方向的夹角。
具体地,颜色直方图是在HSV空间中对颜色进行线性量化得出的,颜色矩信息由颜色直方图的前三阶矩组成,即一阶矩(平均值Mean)、二阶矩(方差Variance)和三阶矩(偏度Skewness)。
具体地,梯度方向在0~180度范围内的区间划分与它在180~360度范围内的区间划分镜像对称。
具体地,关键帧均来自于待识别视频的有效区段,有效区段是待识别视频去掉开头部分和结尾部分后的剩余部分,有效区段的时长至少为整个待识别视频时长的50%,且开头部分和结尾部分的时长均至少为整个待识别视频时长的8%。这样做的好处是:一般不论卡通视频还是非卡通视频都会有片头和片尾,且片头和片尾可能是字幕,这会影响算法对视频的识别结果,因此 最好将视频的开头和结尾去掉。
具体地,图像特征全部来自于关键帧的有效区域,有效区域的面积至少为整个关键帧面积的25%,且有效区域位于关键帧的可取区域内;可取区域为与关键帧具有共同几何中心的关键帧的相似形,且可取区域的面积为关键帧面积的64%。这样做的好处是:不论卡通视频还是非卡通视频可能会在视频的边缘部分具有黑框和/或字幕,为了避免黑框和字幕的干扰,识别时最好选取视频关键帧的几何中间部分。
一种卡通视频识别装置,包含:
关键帧提取模块:用于从待识别视频中提取关键帧;
图像特征提取模块:用于从关键帧中获取图像特征;
第一分类模块:用于使用第一分类算法根据每个关键帧的图像特征计算该关键帧的卡通图像隶属度;
隶属度分布统计模块:用于将卡通图像隶属度的取值范围分成至少三个区间,并统计所有关键帧的卡通图像隶属度的区间分布情况;
第二分类模块:用于使用第二分类算法根据所述区间分布情况判断待识别视频是否为卡通视频。
具体地,图像特征可以包含颜色直方图、边缘直方图、高亮像素比例、边缘像素比例,以及由颜色直方图得到的颜色矩信息;边缘直方图通过统计像素点的梯度方向和梯度幅值得出;高亮像素比例是指HSV空间中V参数大于阈值X的像素点的比例;边缘像素比例是指梯度幅值大于阈值Y的像素点的比例。
具体地,边缘直方图以矩阵形式表示的计算方式为:
Bmn=Lm×Gn
式中,Bmn表示m行n列的边缘直方图矩阵,Lm表示m行1列的梯度幅值直方图矩阵,Gn表示1行n列的梯度方向直方图矩阵;
上述梯度方向直方图矩阵的计算方式为:
Figure PCTCN2016096153-appb-000004
式中:N表示被统计像素点的总数;Qnj为一个1行n列的矩阵,表示第j个像素点对每一个梯度方向区间的贡献;
上述Qnj的计算方式为:设第j个像素点的梯度方向落入量化区间S,则它对量化区间S以及量化区间T均有贡献,而对其他量化区间均无贡献,所述量化区间T是除量化区间S外与第j个像素点的梯度方向夹角最小的量化区间;第j个像素点对量化区间S的贡献vS和对量化区间T的贡献vT按下式计算:
Figure PCTCN2016096153-appb-000005
Figure PCTCN2016096153-appb-000006
式中,γST表示量化区间S的中点所表征的方向与量化区间T的中点所表征的方向的夹角,θS表示第j个像素点的梯度方向与量化区间S的中点所表征的方向的夹角,θT表示第j个像素点的梯度方向与量化区间T的中点所表征的方向的夹角。
具体地,颜色直方图是在HSV空间中对颜色进行线性量化得出的,颜色矩信息由颜色直方图的前三阶矩组成,即一阶矩(平均值Mean)、二阶矩(方差Variance)和三阶矩(偏度Skewness)。
具体地,梯度方向在0~180度范围内的区间划分与它在180~360度范围内的区间划分镜像对称。
具体地,关键帧均来自于待识别视频的有效区段,有效区段是待识别视频去掉开头部分和结尾部分后的剩余部分,有效区段的时长至少为整个待识别视频时长的50%,且开头部分和结尾部分的时长均至少为整个待识别视频时长的8%。这样做的好处是:一般不论卡通视频还是非卡通视频都会有片头和片尾,且片头和片尾可能是字幕,这会影响算法对视频的识别结果,因此最好将视频的开头和结尾去掉。
具体地,图像特征全部来自于关键帧的有效区域,有效区域的面积至少为整个关键帧面积的25%,且有效区域位于关键帧的可取区域内;可取区域为与 关键帧具有共同几何中心的关键帧的相似形,且可取区域的面积为关键帧面积的64%。这样做的好处是:不论卡通视频还是非卡通视频可能会在视频的边缘部分具有黑框和/或字幕,为了避免黑框和字幕的干扰,识别时最好选取视频关键帧的几何中间部分。
本发明还公开了一种电子设备,包括至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够从待识别视频中提取关键帧;从关键帧中获取图像特征;使用第一分类算法根据每个关键帧的图像特征计算该关键帧的卡通图像隶属度;将所述卡通图像隶属度的取值范围分成至少三个区间,并统计所有关键帧的卡通图像隶属度的区间分布情况;使用第二分类算法根据所述区间分布情况判断待识别视频是否为卡通视频。
上述的电子设备,所述图像特征包含颜色直方图、边缘直方图、高亮像素比例、边缘像素比例和颜色矩信息,其中颜色矩信息由所述颜色直方图计算得出;所述边缘直方图通过统计像素点的梯度方向和梯度幅值得出;所述高亮像素比例是指HSV空间中V参数大于阈值X的像素点的比例;所述边缘像素比例是指梯度幅值大于阈值Y的像素点的比例。
上述的电子设备,所述边缘直方图以矩阵形式表示的计算方式为:Bmn=Lm×Gn,式中,Bmn表示m行n列的边缘直方图矩阵,Lm表示m行1列的梯度幅值直方图矩阵,Gn表示1行n列的梯度方向直方图矩阵;所述梯度方向直方图矩阵的计算方式为:
Figure PCTCN2016096153-appb-000007
式中:N表示被统计像素点的总数;Qnj为一个1行n列的矩阵,表示第j个像素点对每一个梯度方向区间 的贡献;Qnj的计算方式为:设第j个像素点的梯度方向落入量化区间S,则它对量化区间S以及量化区间T均有贡献,而对其他量化区间均无贡献,所述量化区间T是除量化区间S外与第j个像素点的梯度方向夹角最小的量化区间;第j个像素点对量化区间S的贡献vS和对量化区间T的贡献vT按下式计算:
Figure PCTCN2016096153-appb-000008
式中,γST表示量化区间S的中点所表征的方向与量化区间T的中点所表征的方向的夹角,θS表示第j个像素点的梯度方向与量化区间S的中点所表征的方向的夹角,θT表示第j个像素点的梯度方向与量化区间T的中点所表征的方向的夹角。
上述的电子设备,所述颜色直方图是在HSV空间中对颜色进行线性量化得出的,所述颜色矩信息由所述颜色直方图的前三阶矩组成。
上述的电子设备,所述梯度方向在0~180度范围内的区间划分与它在180~360度范围内的区间划分镜像对称。
上述的电子设备,所述关键帧均来自于所述待识别视频的有效区段,所述有效区段是待识别视频去掉开头部分和结尾部分后的剩余部分,所述有效区段的时长至少为整个待识别视频时长的50%,且所述开头部分和所述结尾部分的时长均至少为整个待识别视频时长的8%。
上述的电子设备,所述图像特征全部来自于关键帧的有效区域,所述有效区域的面积至少为整个关键帧面积的25%,且有效区域位于关键帧的可取区域内;所述可取区域为与关键帧具有共同几何中心的关键帧的相似形,且可取区域的面积为关键帧面积的64%。
本发明还公开了一种非易失性计算机存储介质,其特征在于:所述存储介质存储有计算机可执行指令的所述计算机可执行指令,当由电子设备执行 时使得电子设备能够:从待识别视频中提取关键帧;从关键帧中获取图像特征;使用第一分类算法根据每个关键帧的图像特征计算该关键帧的卡通图像隶属度;将所述卡通图像隶属度的取值范围分成至少三个区间,并统计所有关键帧的卡通图像隶属度的区间分布情况;使用第二分类算法根据所述区间分布情况判断待识别视频是否为卡通视频。
上述的存储介质,所述图像特征包含颜色直方图、边缘直方图、高亮像素比例、边缘像素比例和颜色矩信息,其中颜色矩信息由所述颜色直方图计算得出;所述边缘直方图通过统计像素点的梯度方向和梯度幅值得出;所述高亮像素比例是指HSV空间中V参数大于阈值X的像素点的比例;所述边缘像素比例是指梯度幅值大于阈值Y的像素点的比例。
上述的存储介质,所述边缘直方图以矩阵形式表示的计算方式为:Bmn=Lm×Gn,式中,Bmn表示m行n列的边缘直方图矩阵,Lm表示m行1列的梯度幅值直方图矩阵,Gn表示1行n列的梯度方向直方图矩阵;所述梯度方向直方图矩阵的计算方式为:
Figure PCTCN2016096153-appb-000009
式中:N表示被统计像素点的总数;Qnj为一个1行n列的矩阵,表示第j个像素点对每一个梯度方向区间的贡献;Qnj的计算方式为:设第j个像素点的梯度方向落入量化区间S,则它对量化区间S以及量化区间T均有贡献,而对其他量化区间均无贡献,所述量化区间T是除量化区间S外与第j个像素点的梯度方向夹角最小的量化区间;第j个像素点对量化区间S的贡献vS和对量化区间T的贡献vT按下式计算:
Figure PCTCN2016096153-appb-000010
式中,γST表示量化区间S的中点所表征的方向与量化区间T的中点所表征的方向的夹角,θS表示第j个像素点的梯度方向与量化区间S的中点所表征的方向的夹角,θT表示第j个像素点的梯度方向 与量化区间T的中点所表征的方向的夹角。
上述的存储介质,所述颜色直方图是在HSV空间中对颜色进行线性量化得出的,所述颜色矩信息由所述颜色直方图的前三阶矩组成。
上述的存储介质,所述梯度方向在0~180度范围内的区间划分与它在180~360度范围内的区间划分镜像对称。
上述的存储介质,所述关键帧均来自于所述待识别视频的有效区段,所述有效区段是待识别视频去掉开头部分和结尾部分后的剩余部分,所述有效区段的时长至少为整个待识别视频时长的50%,且所述开头部分和所述结尾部分的时长均至少为整个待识别视频时长的8%。
上述的存储介质,所述图像特征全部来自于关键帧的有效区域,所述有效区域的面积至少为整个关键帧面积的25%,且有效区域位于关键帧的可取区域内;所述可取区域为与关键帧具有共同几何中心的关键帧的相似形,且可取区域的面积为关键帧面积的64%。
本发明实施例还提供了一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行上述任一所述的方法。
从上面所述可以看出,本发明的有益效果在于:
本发明将卡通图像隶属度划分成了多个区间,并对所有关键帧的卡通图像隶属度的分布情况进行了统计,然后将统计结果输入第二分类器进行了再次分类,从而提高了卡通视频的判断准确性,同时依然保持了较低的算法复杂度,对现有技术是一种重要改进。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明方法实施例的一种流程图;
图2为本发明装置实施例的一种结构示意图;
图3为本发明实施例中的一种角度区间划分方式示意图;
图4为本发明实施例中电子设备的硬件结构示意图。
具体实施方式
下面将结合附图对本发明的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
在本发明的描述中,需要说明的是,术语“中心”、“上”、“下”、“左”、“右”、“竖直”、“水平”、“内”、“外”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本发明和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本发明的限制。此外,术语“第一”、“第二”、“第三”仅用于描述目的,而不能理解为指示或暗示相对重要性。
在本发明的描述中,需要说明的是,除非另有明确的规定和限定,术语“安装”、“相连”、“连接”应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或一体地连接;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,还可以是两个元件内部的连通,可以是无线连接,也可以是有线连接。对于本领域的普通技术人员而言,可以具体情况理解上述术语在本发明中的具体含义。
一种卡通视频识别方法,该方法包含:
从待识别视频中提取关键帧,关键帧可以使用ffmpeg等等软件提取;
从关键帧中获取图像特征;
使用第一分类算法根据每个关键帧的图像特征计算该关键帧的卡通图像隶属度;
将卡通图像隶属度的取值范围分成至少三个区间,并统计所有关键帧的卡通图像隶属度的区间分布情况;
使用第二分类算法根据所述区间分布情况判断待识别视频是否为卡通视频。
具体地,图像特征包含颜色直方图、边缘直方图、高亮像素比例、边缘像素比例,以及由颜色直方图得到的颜色矩信息;边缘直方图通过统计像素点的梯度方向和梯度幅值得出,具体地可以使用Sobel算子;高亮像素比例是指HSV(Hue-Saturation-Value,色调-饱和度-明度)空间中V(Value,明度)参数大于阈值X的像素点的比例;边缘像素比例是指梯度幅值大于阈值Y的像素点的比例。
具体地,边缘直方图以矩阵形式表示的计算方式为:
Bmn=Lm×Gn
式中,Bmn表示m行n列的边缘直方图矩阵,Lm表示m行1列的梯度幅值直方图矩阵,Gn表示1行n列的梯度方向直方图矩阵;
上述梯度方向直方图矩阵的计算方式为:
Figure PCTCN2016096153-appb-000011
式中:N表示被统计像素点的总数;Qnj为一个1行n列的矩阵,表示第j个像素点对每一个梯度方向区间的贡献;
上述Qnj的计算方式为:设第j个像素点的梯度方向落入量化区间S,则它对量化区间S以及量化区间T均有贡献,而对其他量化区间均无贡献,所述量化区间T是除量化区间S外与第j个像素点的梯度方向夹角最小的量化区 间;第j个像素点对量化区间S的贡献vS和对量化区间T的贡献vT按下式计算:
Figure PCTCN2016096153-appb-000012
Figure PCTCN2016096153-appb-000013
式中,γST表示量化区间S的中点所表征的方向与量化区间T的中点所表征的方向的最小正夹角,θS表示第j个像素点的梯度方向与量化区间S的中点所表征的方向的最小正夹角,θT表示第j个像素点的梯度方向与量化区间T的中点所表征的方向的最小正夹角。
例如,如图3所示为一种角度区间划分方式,其中每两条相邻实线为一个角度区间,图中的两条虚线分别表示第一角度区间和第二角度区间的角平分线,图中的点划线表示某一像素点的梯度方向,其落在第二角度区间内,同时也落在两条虚线的夹角范围内,因此该像素点对第一角度区间和第二角度区间均有贡献,它对第一角度区间的贡献为
Figure PCTCN2016096153-appb-000014
而对第二角度区间的贡献为
Figure PCTCN2016096153-appb-000015
当然,关于梯度方向的统计还有更简单的方式,即若某一像素点的梯度方向落在第二角度区间内,则该像素点只对第二角度区间有贡献,而对其他角度区间均无贡献。这两种统计方法均可应用在本发明的所有具体实施方式中。
具体地,颜色直方图是在HSV空间中对颜色进行线性量化得出的,颜色矩信息由颜色直方图的前三阶矩组成,即一阶矩(平均值Mean)、二阶矩(方差Variance)和三阶矩(偏度Skewness),其计算方式分别为:
一阶矩:
Figure PCTCN2016096153-appb-000016
二阶矩:
Figure PCTCN2016096153-appb-000017
三阶矩:
Figure PCTCN2016096153-appb-000018
其中,N表示像素点的总数,i表示图像通道(即HSV空间中的H通道、S通道或V通道),pij表示关键帧第j个像素点在通道i下的灰度值。
具体地,梯度方向在0~180度范围内的区间划分与它在180~360度范围内的区间划分镜像对称,比如从0度开始将圆周均匀地划分为8个区间。
具体地,关键帧均来自于待识别视频的有效区段,有效区段是待识别视频去掉开头部分和结尾部分后的剩余部分,有效区段的时长至少为整个待识别视频时长的50%,且开头部分和结尾部分的时长均至少为整个待识别视频时长的8%。这样做的好处是:一般不论卡通视频还是非卡通视频都会有片头和片尾,且片头和片尾可能是字幕,这会影响算法对视频的识别结果,因此最好将视频的开头和结尾去掉。
具体地,图像特征全部来自于关键帧的有效区域,有效区域的面积至少为整个关键帧面积的25%,且有效区域位于关键帧的可取区域内;可取区域为与关键帧具有共同几何中心的关键帧的相似形,且可取区域的面积为关键帧面积的64%。这样做的好处是:不论卡通视频还是非卡通视频可能会在视频的边缘部分具有黑框和/或字幕,为了避免黑框和字幕的干扰,识别时最好选取视频关键帧的几何中间部分。
一种卡通视频识别装置,它包含:
关键帧提取模块:用于从待识别视频中提取关键帧;
图像特征提取模块:用于从关键帧中获取图像特征;
第一分类模块:用于使用第一分类算法根据每个关键帧的图像特征计算该关键帧的卡通图像隶属度;
隶属度分布统计模块:用于将卡通图像隶属度的取值范围分成至少三个区间,并统计所有关键帧的卡通图像隶属度的区间分布情况;
第二分类模块:使用第二分类算法根据区间分布情况判断待识别视频是否为卡通视频。
具体地,图像特征包含颜色直方图、边缘直方图、高亮像素比例、边缘像素比例,以及由颜色直方图得到的颜色矩信息;边缘直方图通过统计像素点的梯度方向和梯度幅值得出;高亮像素比例是指HSV空间中V参数大于阈值X的像素点的比例;边缘像素比例是指梯度幅值大于阈值Y的像素点的比例。
具体地,边缘直方图以矩阵形式表示的计算方式为:
Bmn=Lm×Gn
式中,Bmn表示m行n列的边缘直方图矩阵,Lm表示m行1列的梯度幅值直方图矩阵,Gn表示1行n列的梯度方向直方图矩阵;
上述梯度方向直方图矩阵的计算方式为:
Figure PCTCN2016096153-appb-000019
式中:N表示被统计像素点的总数;Qnj为一个1行n列的矩阵,表示第j个像素点对每一个梯度方向区间的贡献;
上述Qnj的计算方式为:设第j个像素点的梯度方向落入量化区间S,则它对量化区间S以及量化区间T均有贡献,而对其他量化区间均无贡献,所述量化区间T是除量化区间S外与第j个像素点的梯度方向夹角最小的量化区间;第j个像素点对量化区间S的贡献vS和对量化区间T的贡献vT按下式计算:
Figure PCTCN2016096153-appb-000020
Figure PCTCN2016096153-appb-000021
式中,γST表示量化区间S的中点所表征的方向与量化区间T的中点所表征的方向的最小正夹角,θS表示第j个像素点的梯度方向与量化区间S的中点所表征的方向的最小正夹角,θT表示第j个像素点的梯度方向与量化区间T的中点所表征的方向的最小正夹角。
当然,关于梯度方向的统计还有更简单的方式,即若某一像素点的梯度方向落在第二角度区间内,则该像素点只对第二角度区间有贡献,而对其他角度区间均无贡献。这两种统计方法均可应用在本发明所有实施方式的具体实践中。
具体地,颜色直方图是在HSV空间中对颜色进行线性量化得出的,颜色矩信息由颜色直方图的前三阶矩组成,即一阶矩(平均值Mean)、二阶矩 (方差Variance)和三阶矩(偏度Skewness)。
具体地,梯度方向在0~180度范围内的区间划分与它在180~360度范围内的区间划分镜像对称。
具体地,关键帧均来自于待识别视频的有效区段,有效区段是待识别视频去掉开头部分和结尾部分后的剩余部分,有效区段的时长至少为整个待识别视频时长的50%,且开头部分和结尾部分的时长均至少为整个待识别视频时长的8%。这样做的好处是:一般不论卡通视频还是非卡通视频都会有片头和片尾,且片头和片尾可能是字幕,这会影响算法对视频的识别结果,因此最好将视频的开头和结尾去掉。
具体地,图像特征全部来自于关键帧的有效区域,有效区域的面积至少为整个关键帧面积的25%,且有效区域位于关键帧的可取区域内;可取区域为与关键帧具有共同几何中心的关键帧的相似形,且可取区域的面积为关键帧面积的64%。这样做的好处是:不论卡通视频还是非卡通视频可能会在视频的边缘部分具有黑框和/或字幕,为了避免黑框和字幕的干扰,识别时最好选取视频关键帧的几何中间部分。
作为本发明方法的一个实施例,如图1所示,一种卡通视频识别方法,该方法包含如下步骤:
步骤101,截取待识别视频的中间80%时长部分;
步骤102,从截取部分中提取关键帧;
步骤103,截取关键帧的长、宽中部70%的区域;
步骤104,将截取区域的图像转换到HSV空间;
步骤105,在HSV空间中统计截取区域的颜色直方图,H参数均匀地划分为8个区间,S和V参数分别均匀地划分为6个区间;同时统计截取区域的边缘直方图,其中边缘直方图是通过统计像素点的梯度方向和梯度幅值得出的,梯度方向和梯度幅值都均匀地划分为8个区间;
步骤106,根据颜色直方图和边缘直方图统计截取区域的高亮像素比例和边缘像素比例;其中高亮像素比例为V参数大于0.5的像素点的比例,边缘像素比例为梯度幅值大于阈值0.087的像素点的比例;
步骤107,使用事先训练过的SVM(Support Vector Machine,支持向量机)分类器对截取区域进行分类,计算出截取区域的卡通图像隶属度;
步骤108,将隶属度的取值范围均匀分为5个区间,统计所有关键帧的 隶属度分布;
步骤109,将隶属度分布输入另一经过事先训练的SVM分类器,从而计算该视频为卡通视频的概率,若概率超过50%则判定其为卡通视频;
步骤110,输出判定结果。
作为本发明装置的一个实施例,如图2所示,一种装置,该卡通视频识别装置2包含关键帧提取模块201、图像特征提取模块202、第一分类模块203、隶属度分布统计模块204和第二分类模块205;隶属度分布统计模块204用于将卡通图像隶属度的取值范围分成5个区间,并统计卡通图像隶属度的区间分布情况,再根据区间分布情况使用第二分类算法进行分类,从而判断待识别视频是否为卡通视频。使用时,将待识别视频输入关键帧提取模块201,关键帧提取模块201将提取到的关键帧传递给图像特征提取模块202,图像特征提取模块202从关键帧中提取图像特征,其提取方法可以采用本发明方法中提及的任何一种方式,接着图像特征提取模块202将图像特征传递给第一分类模块203,第一分类模块203计算出每个关键帧的卡通图像隶属度,并将结果传递给隶属度分布统计模块204,隶属度分布统计模块204将统计结果传递给第二分类模块205,经过第二分类模块205的分类计算判断待分类视频是否为卡通视频,最终输出视频类型。
容易想到,本发明装置的具体实现既可以是一种专用设备,也可以是在电脑、手机、平板等智能设备上安装特定软件而形成的设备。
需要说明的是,以上叙述中对范围做出了限定的各个参数,在该范围内选取任何端点值或中间值都是可取的,并且各参数的不同取值组合也是可行的。在了解到本发明具体实施方式所限定的范围后,本领域技术人员不需要付出任何创造性劳动都可以对其中的每一个参数进行具体取值,其所得到的应用效果都没有超出本发明所记载的范围,因此,为了节约篇幅,发明人不再对各种可能取值及其可能组合一一列举。
上述实施例的装置用于实现前述实施例中相应的方法,并且具有相应的方法实施例的有益效果,在此不再赘述。
所属领域的普通技术人员应当理解:以上任何实施例的讨论仅为示例性的,并非旨在暗示本公开的范围(包括权利要求)被限于这些例子;在本发明的思路下,以上实施例或者不同实施例中的技术特征之间也可以进行组合,步骤可以以任意顺序实现,并存在如上所述的本发明的不同方面的许多 其它变化,为了简明它们没有在细节中提供。
另外,为简化说明和讨论,并且为了不会使本发明难以理解,在所提供的附图中可以示出或可以不示出与集成电路(IC)芯片和其它部件的公知的电源/接地连接。此外,可以以框图的形式示出装置,以便避免使本发明难以理解,并且这也考虑了以下事实,即关于这些框图装置的实施方式的细节是高度取决于将要实施本发明的平台的(即,这些细节应当完全处于本领域技术人员的理解范围内)。在阐述了具体细节(例如,电路)以描述本发明的示例性实施例的情况下,对本领域技术人员来说显而易见的是,可以在没有这些具体细节的情况下或者这些具体细节有变化的情况下实施本发明。因此,这些描述应被认为是说明性的而不是限制性的。
如图4所示,本发明实施例还公开了一种电子设备,包括至少一个处理器810;以及,与所述至少一个处理器810通信连接的存储器800;其中,所述存储器800存储有可被所述至少一个处理器810执行的指令,所述指令被所述至少一个处理器810执行,以使所述至少一个处理器810能够从待识别视频中提取关键帧;从关键帧中获取图像特征;使用第一分类算法根据每个关键帧的图像特征计算该关键帧的卡通图像隶属度;将所述卡通图像隶属度的取值范围分成至少三个区间,并统计所有关键帧的卡通图像隶属度的区间分布情况;使用第二分类算法根据所述区间分布情况判断待识别视频是否为卡通视频。所述电子设备还包括与所述存储器800和所述处理器电连接的输入装置830和输出装置840,所述电连接优选为通过总线连接。
本实施例的电子设备,优选地,所述图像特征包含颜色直方图、边缘直方图、高亮像素比例、边缘像素比例和颜色矩信息,其中颜色矩信息由所述颜色直方图计算得出;所述边缘直方图通过统计像素点的梯度方向和梯度幅值得出;所述高亮像素比例是指HSV空间中V参数大于阈值X的像素点的比例;所述边缘像素比例是指梯度幅值大于阈值Y的像素点的比例。
本实施例的电子设备,优选地,所述边缘直方图以矩阵形式表示的计算方式为:Bmn=Lm×Gn,式中,Bmn表示m行n列的边缘直方图矩阵,Lm表示m行1列的梯度幅值直方图矩阵,Gn表示1行n列的梯度方向直方图矩阵;所述梯度方向直方图矩阵的计算方式为:
Figure PCTCN2016096153-appb-000022
式中:N表示被统计像素点的总数;Qnj为一个1行n列的矩阵,表示第j个像素点对每一个梯度方向区间的贡献;Qnj的计算方式为:设第j个像素点的梯度方向落入量化区间S,则它对量化区间S以及量化区间T均有贡献,而对其他量化区间均无贡献,所述量化区间T是除量化区间S外与第j个像素点的梯度方向夹角最小的量化区间;第j个像素点对量化区间S的贡献vS和对量化区间T的贡献vT按下式计算:
Figure PCTCN2016096153-appb-000023
式中,γST表示量化区间S的中点所表征的方向与量化区间T的中点所表征的方向的夹角,θS表示第j个像素点的梯度方向与量化区间S的中点所表征的方向的夹角,θT表示第j个像素点的梯度方向与量化区间T的中点所表征的方向的夹角。
本实施例的电子设备,优选地,所述颜色直方图是在HSV空间中对颜色进行线性量化得出的,所述颜色矩信息由所述颜色直方图的前三阶矩组成。
本实施例的电子设备,优选地,所述梯度方向在0~180度范围内的区间划分与它在180~360度范围内的区间划分镜像对称。
本实施例的电子设备,优选地,所述关键帧均来自于所述待识别视频的有效区段,所述有效区段是待识别视频去掉开头部分和结尾部分后的剩余部分,所述有效区段的时长至少为整个待识别视频时长的50%,且所述开头部分和所述结尾部分的时长均至少为整个待识别视频时长的8%。
本实施例的电子设备,优选地,所述图像特征全部来自于关键帧的有效 区域,所述有效区域的面积至少为整个关键帧面积的25%,且有效区域位于关键帧的可取区域内;所述可取区域为与关键帧具有共同几何中心的关键帧的相似形,且可取区域的面积为关键帧面积的64%。
本发明还公开了一种非易失性计算机存储介质,其特征在于:所述存储介质存储有计算机可执行指令的所述计算机可执行指令,当由电子设备执行时使得电子设备能够:从待识别视频中提取关键帧;从关键帧中获取图像特征;使用第一分类算法根据每个关键帧的图像特征计算该关键帧的卡通图像隶属度;将所述卡通图像隶属度的取值范围分成至少三个区间,并统计所有关键帧的卡通图像隶属度的区间分布情况;使用第二分类算法根据所述区间分布情况判断待识别视频是否为卡通视频。
本实施例的存储介质,优选地,所述图像特征包含颜色直方图、边缘直方图、高亮像素比例、边缘像素比例和颜色矩信息,其中颜色矩信息由所述颜色直方图计算得出;所述边缘直方图通过统计像素点的梯度方向和梯度幅值得出;所述高亮像素比例是指HSV空间中V参数大于阈值X的像素点的比例;所述边缘像素比例是指梯度幅值大于阈值Y的像素点的比例。
本实施例的存储介质,优选地,所述边缘直方图以矩阵形式表示的计算方式为:Bmn=Lm×Gn,式中,Bmn表示m行n列的边缘直方图矩阵,Lm表示m行1列的梯度幅值直方图矩阵,Gn表示1行n列的梯度方向直方图矩阵;所述梯度方向直方图矩阵的计算方式为:
Figure PCTCN2016096153-appb-000024
式中:N表示被统计像素点的总数;Qnj为一个1行n列的矩阵,表示第j个像素点对每一个梯度方向区间的贡献;Qnj的计算方式为:设第j个像素点的梯度方向落入量化区间S,则它对量化区间S以及量化区间T均有贡献,而对其他量化区间均无 贡献,所述量化区间T是除量化区间S外与第j个像素点的梯度方向夹角最小的量化区间;第j个像素点对量化区间S的贡献vS和对量化区间T的贡献vT按下式计算:
Figure PCTCN2016096153-appb-000025
式中,γST表示量化区间S的中点所表征的方向与量化区间T的中点所表征的方向的夹角,θS表示第j个像素点的梯度方向与量化区间S的中点所表征的方向的夹角,θT表示第j个像素点的梯度方向与量化区间T的中点所表征的方向的夹角。
本实施例的存储介质,优选地,所述颜色直方图是在HSV空间中对颜色进行线性量化得出的,所述颜色矩信息由所述颜色直方图的前三阶矩组成。
本实施例的存储介质,优选地,所述梯度方向在0~180度范围内的区间划分与它在180~360度范围内的区间划分镜像对称。
本实施例的存储介质,优选地,所述关键帧均来自于所述待识别视频的有效区段,所述有效区段是待识别视频去掉开头部分和结尾部分后的剩余部分,所述有效区段的时长至少为整个待识别视频时长的50%,且所述开头部分和所述结尾部分的时长均至少为整个待识别视频时长的8%。
本实施例的存储介质,优选地,所述图像特征全部来自于关键帧的有效区域,所述有效区域的面积至少为整个关键帧面积的25%,且有效区域位于关键帧的可取区域内;所述可取区域为与关键帧具有共同几何中心的关键帧的相似形,且可取区域的面积为关键帧面积的64%。本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存 储器等)上实施的计算机程序产品的形式。
本发明实施例还提供了一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行上述实施例所述的方法。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图 一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
显然,上述实施例仅仅是为清楚地说明所作的举例,而并非对实施方式的限定。对于所属领域的普通技术人员来说,在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。而由此所引伸出的显而易见的变化或变动仍处于本发明创造的保护范围之中。

Claims (29)

  1. 一种卡通视频识别方法,其特征在于,包含:
    从待识别视频中提取关键帧;
    从关键帧中获取图像特征;
    使用第一分类算法根据每个关键帧的图像特征计算该关键帧的卡通图像隶属度;
    将所述卡通图像隶属度的取值范围分成至少三个区间,并统计所有关键帧的卡通图像隶属度的区间分布情况;
    使用第二分类算法根据所述区间分布情况判断待识别视频是否为卡通视频。
  2. 根据权利要求1所述的卡通视频识别方法,其特征在于:所述图像特征包含颜色直方图、边缘直方图、高亮像素比例、边缘像素比例和颜色矩信息,其中颜色矩信息由所述颜色直方图计算得出;所述边缘直方图通过统计像素点的梯度方向和梯度幅值得出;
    所述高亮像素比例是指HSV空间中V参数大于阈值X的像素点的比例;所述边缘像素比例是指梯度幅值大于阈值Y的像素点的比例。
  3. 根据权利要求2所述的卡通视频识别方法,其特征在于:所述边缘直方图以矩阵形式表示的计算方式为:
    Bmn=Lm×Gn
    式中,Bmn表示m行n列的边缘直方图矩阵,Lm表示m行1列的梯度幅值直方图矩阵,Gn表示1行n列的梯度方向直方图矩阵;
    所述梯度方向直方图矩阵的计算方式为:
    Figure PCTCN2016096153-appb-100001
    式中:N表示被统计像素点的总数;Qnj为一个1行n列的矩阵,表示第j个像素点对每一个梯度方向区间的贡献;
    Qnj的计算方式为:设第j个像素点的梯度方向落入量化区间S,则它对量化区间S以及量化区间T均有贡献,而对其他量化区间均无贡献,所述量化区间T是除量化区间S外与第j个像素点的梯度方向夹角最小的量化区 间;第j个像素点对量化区间S的贡献vS和对量化区间T的贡献vT按下式计算:
    Figure PCTCN2016096153-appb-100002
    Figure PCTCN2016096153-appb-100003
    式中,γST表示量化区间S的中点所表征的方向与量化区间T的中点所表征的方向的夹角,θS表示第j个像素点的梯度方向与量化区间S的中点所表征的方向的夹角,θT表示第j个像素点的梯度方向与量化区间T的中点所表征的方向的夹角。
  4. 根据权利要求2所述的卡通视频识别方法,其特征在于:所述颜色直方图是在HSV空间中对颜色进行线性量化得出的,所述颜色矩信息由所述颜色直方图的前三阶矩组成。
  5. 根据权利要求2所述的卡通视频识别方法,其特征在于:所述梯度方向在0~180度范围内的区间划分与它在180~360度范围内的区间划分镜像对称。
  6. 根据权利要求1所述的卡通视频识别方法,其特征在于:所述关键帧均来自于所述待识别视频的有效区段,所述有效区段是待识别视频去掉开头部分和结尾部分后的剩余部分,所述有效区段的时长至少为整个待识别视频时长的50%,且所述开头部分和所述结尾部分的时长均至少为整个待识别视频时长的8%。
  7. 根据权利要求1所述的卡通视频识别方法,其特征在于:所述图像特征全部来自于关键帧的有效区域,所述有效区域的面积至少为整个关键帧面积的25%,且有效区域位于关键帧的可取区域内;所述可取区域为与关键帧具有共同几何中心的关键帧的相似形,且可取区域的面积为关键帧面积的64%。
  8. 一种卡通视频识别装置,其特征在于,包含:
    关键帧提取模块:用于从待识别视频中提取关键帧;
    图像特征提取模块:用于从关键帧中获取图像特征;
    第一分类模块:用于使用第一分类算法根据每个关键帧的图像特征计算该关键帧的卡通图像隶属度;
    隶属度分布统计模块:用于将所述卡通图像隶属度的取值范围分成至少三个区间,并统计所有关键帧的卡通图像隶属度的区间分布情况;
    第二分类模块:使用第二分类算法根据所述区间分布情况判断待识别视频是否为卡通视频。
  9. 根据权利要求8所述的卡通视频识别装置,其特征在于:所述图像特征包含颜色直方图、边缘直方图、高亮像素比例、边缘像素比例和颜色矩信息,其中颜色矩信息由颜色直方图计算得到;所述边缘直方图通过统计像素点的梯度方向和梯度幅值得出;所述高亮像素比例是指HSV空间中V参数大于阈值X的像素点的比例;所述边缘像素比例是指梯度幅值大于阈值Y的像素点的比例。
  10. 根据权利要求9所述的卡通视频识别装置,其特征在于:所述边缘直方图以矩阵形式表示的计算方式为:
    Bmn=Lm×Gn
    式中,Bmn表示m行n列的边缘直方图矩阵,Lm表示m行1列的梯度幅值直方图矩阵,Gn表示1行n列的梯度方向直方图矩阵;
    所述梯度方向直方图矩阵的计算方式为:
    Figure PCTCN2016096153-appb-100004
    式中:N表示被统计像素点的总数;Qnj为一个1行n列的矩阵,表示第j个像素点对每一个梯度方向区间的贡献;
    Qnj的计算方式为:设第j个像素点的梯度方向落入量化区间S,则它对量化区间S以及量化区间T均有贡献,而对其他量化区间均无贡献,所述量化区间T是除量化区间S外与第j个像素点的梯度方向夹角最小的量化区间;第j个像素点对量化区间S的贡献vS和对量化区间T的贡献vT按下式计算:
    Figure PCTCN2016096153-appb-100005
    Figure PCTCN2016096153-appb-100006
    式中,γST表示量化区间S的中点所表征的方向与量化区间T的中点所表征的方向的夹角,θS表示第j个像素点的梯度方向与量化区间S的中点所表 征的方向的夹角,θT表示第j个像素点的梯度方向与量化区间T的中点所表征的方向的夹角。
  11. 根据权利要求9所述的卡通视频识别装置,其特征在于:所述颜色直方图是在HSV空间中对颜色进行线性量化得出的,所述颜色矩信息由所述颜色直方图的前三阶矩组成。
  12. 根据权利要求9所述的卡通视频识别装置,其特征在于:所述梯度方向在0~180度范围内的区间划分与它在180~360度范围内的区间划分镜像对称。
  13. 根据权利要求9所述的卡通视频识别装置,其特征在于:所述关键帧均来自于所述待识别视频的有效区段,所述有效区段是待识别视频去掉开头部分和结尾部分后的剩余部分,所述有效区段的时长至少为整个待识别视频时长的50%,且所述开头部分和所述结尾部分的时长均至少为整个待识别视频时长的8%。
  14. 根据权利要求9所述的卡通视频识别装置,其特征在于:所述图像特征全部来自于关键帧的有效区域,所述有效区域的面积至少为整个关键帧面积的25%,且有效区域位于关键帧的可取区域内;所述可取区域为与关键帧具有共同几何中心的关键帧的相似形,且可取区域的面积为关键帧面积的64%。
  15. 一种电子设备,其特征在于包括至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够
    从待识别视频中提取关键帧;
    从关键帧中获取图像特征;
    使用第一分类算法根据每个关键帧的图像特征计算该关键帧的卡通图像隶属度;
    将所述卡通图像隶属度的取值范围分成至少三个区间,并统计所有关键帧的卡通图像隶属度的区间分布情况;
    使用第二分类算法根据所述区间分布情况判断待识别视频是否为卡通视频。
  16. 根据权利要求15所述的电子设备,其特征在于:所述图像特征包含 颜色直方图、边缘直方图、高亮像素比例、边缘像素比例和颜色矩信息,其中颜色矩信息由所述颜色直方图计算得出;所述边缘直方图通过统计像素点的梯度方向和梯度幅值得出;
    所述高亮像素比例是指HSV空间中V参数大于阈值X的像素点的比例;所述边缘像素比例是指梯度幅值大于阈值Y的像素点的比例。
  17. 根据权利要求16所述的电子设备,其特征在于:所述边缘直方图以矩阵形式表示的计算方式为:
    Bmn=Lm×Gn
    式中,Bmn表示m行n列的边缘直方图矩阵,Lm表示m行1列的梯度幅值直方图矩阵,Gn表示1行n列的梯度方向直方图矩阵;
    所述梯度方向直方图矩阵的计算方式为:
    Figure PCTCN2016096153-appb-100007
    式中:N表示被统计像素点的总数;Qnj为一个1行n列的矩阵,表示第j个像素点对每一个梯度方向区间的贡献;
    Qnj的计算方式为:设第j个像素点的梯度方向落入量化区间S,则它对量化区间S以及量化区间T均有贡献,而对其他量化区间均无贡献,所述量化区间T是除量化区间S外与第j个像素点的梯度方向夹角最小的量化区间;第j个像素点对量化区间S的贡献vS和对量化区间T的贡献vT按下式计算:
    Figure PCTCN2016096153-appb-100008
    Figure PCTCN2016096153-appb-100009
    式中,γST表示量化区间S的中点所表征的方向与量化区间T的中点所表征的方向的夹角,θS表示第j个像素点的梯度方向与量化区间S的中点所表征的方向的夹角,θT表示第j个像素点的梯度方向与量化区间T的中点所表征的方向的夹角。
  18. 根据权利要求16所述的电子设备,其特征在于:所述颜色直方图是在HSV空间中对颜色进行线性量化得出的,所述颜色矩信息由所述颜色直方图的前三阶矩组成。
  19. 根据权利要求16所述的电子设备,其特征在于:所述梯度方向在0~180度范围内的区间划分与它在180~360度范围内的区间划分镜像对称。
  20. 根据权利要求15所述的电子设备,其特征在于:所述关键帧均来自于所述待识别视频的有效区段,所述有效区段是待识别视频去掉开头部分和结尾部分后的剩余部分,所述有效区段的时长至少为整个待识别视频时长的50%,且所述开头部分和所述结尾部分的时长均至少为整个待识别视频时长的8%。
  21. 根据权利要求15所述的电子设备,其特征在于:所述图像特征全部来自于关键帧的有效区域,所述有效区域的面积至少为整个关键帧面积的25%,且有效区域位于关键帧的可取区域内;所述可取区域为与关键帧具有共同几何中心的关键帧的相似形,且可取区域的面积为关键帧面积的64%。
  22. 一种非易失性计算机存储介质,其特征在于:所述存储介质存储有计算机可执行指令的所述计算机可执行指令,当由电子设备执行时使得电子设备能够:
    从待识别视频中提取关键帧;
    从关键帧中获取图像特征;
    使用第一分类算法根据每个关键帧的图像特征计算该关键帧的卡通图像隶属度;
    将所述卡通图像隶属度的取值范围分成至少三个区间,并统计所有关键帧的卡通图像隶属度的区间分布情况;
    使用第二分类算法根据所述区间分布情况判断待识别视频是否为卡通视频。
  23. 根据权利要求22所述的存储介质,其特征在于:所述图像特征包含颜色直方图、边缘直方图、高亮像素比例、边缘像素比例和颜色矩信息,其中颜色矩信息由所述颜色直方图计算得出;所述边缘直方图通过统计像素点的梯度方向和梯度幅值得出;
    所述高亮像素比例是指HSV空间中V参数大于阈值X的像素点的比例;所述边缘像素比例是指梯度幅值大于阈值Y的像素点的比例。
  24. 根据权利要求23所述的存储介质,其特征在于:所述边缘直方图以 矩阵形式表示的计算方式为:
    Bmn=Lm×Gn
    式中,Bmn表示m行n列的边缘直方图矩阵,Lm表示m行1列的梯度幅值直方图矩阵,Gn表示1行n列的梯度方向直方图矩阵;
    所述梯度方向直方图矩阵的计算方式为:
    Figure PCTCN2016096153-appb-100010
    式中:N表示被统计像素点的总数;Qnj为一个1行n列的矩阵,表示第j个像素点对每一个梯度方向区间的贡献;
    Qnj的计算方式为:设第j个像素点的梯度方向落入量化区间S,则它对量化区间S以及量化区间T均有贡献,而对其他量化区间均无贡献,所述量化区间T是除量化区间S外与第j个像素点的梯度方向夹角最小的量化区间;第j个像素点对量化区间S的贡献vS和对量化区间T的贡献vT按下式计算:
    Figure PCTCN2016096153-appb-100011
    Figure PCTCN2016096153-appb-100012
    式中,γST表示量化区间S的中点所表征的方向与量化区间T的中点所表征的方向的夹角,θS表示第j个像素点的梯度方向与量化区间S的中点所表征的方向的夹角,θT表示第j个像素点的梯度方向与量化区间T的中点所表征的方向的夹角。
  25. 根据权利要求23所述的存储介质,其特征在于:所述颜色直方图是在HSV空间中对颜色进行线性量化得出的,所述颜色矩信息由所述颜色直方图的前三阶矩组成。
  26. 根据权利要求23所述的存储介质,其特征在于:所述梯度方向在0~180度范围内的区间划分与它在180~360度范围内的区间划分镜像对称。
  27. 根据权利要求22所述的存储介质,其特征在于:所述关键帧均来自于所述待识别视频的有效区段,所述有效区段是待识别视频去掉开头部分和结尾部分后的剩余部分,所述有效区段的时长至少为整个待识别视频时长的 50%,且所述开头部分和所述结尾部分的时长均至少为整个待识别视频时长的8%。
  28. 根据权利要求22所述的存储介质,其特征在于:所述图像特征全部来自于关键帧的有效区域,所述有效区域的面积至少为整个关键帧面积的25%,且有效区域位于关键帧的可取区域内;所述可取区域为与关键帧具有共同几何中心的关键帧的相似形,且可取区域的面积为关键帧面积的64%。
  29. 一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,其特征在于,当所述程序指令被计算机执行时,使所述计算机执行上述任一权利要求所述的方法。
PCT/CN2016/096153 2016-03-31 2016-08-22 一种卡通视频识别方法、装置和电子设备 WO2017166597A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610201081.0 2016-03-31
CN201610201081.0A CN105844251A (zh) 2016-03-31 2016-03-31 一种卡通视频识别方法及装置

Publications (1)

Publication Number Publication Date
WO2017166597A1 true WO2017166597A1 (zh) 2017-10-05

Family

ID=56597759

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/096153 WO2017166597A1 (zh) 2016-03-31 2016-08-22 一种卡通视频识别方法、装置和电子设备

Country Status (2)

Country Link
CN (1) CN105844251A (zh)
WO (1) WO2017166597A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109871827A (zh) * 2019-03-14 2019-06-11 大连海事大学 一种结合区域置信度和压力分布方向强度的足迹表达方法
CN111325181A (zh) * 2020-03-19 2020-06-23 北京海益同展信息科技有限公司 一种状态监测方法、装置、电子设备及存储介质
CN111479130A (zh) * 2020-04-02 2020-07-31 腾讯科技(深圳)有限公司 一种视频定位方法、装置、电子设备和存储介质
CN115544473A (zh) * 2022-09-09 2022-12-30 苏州吉弘能源科技有限公司 一种光伏发电站运维终端登录控制系统

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105844251A (zh) * 2016-03-31 2016-08-10 乐视控股(北京)有限公司 一种卡通视频识别方法及装置
CN111797912B (zh) * 2020-06-23 2023-09-22 山东浪潮超高清视频产业有限公司 影片年代类型识别的系统、方法及识别模型的构建方法
CN113222058B (zh) * 2021-05-28 2024-05-10 芯算一体(深圳)科技有限公司 一种图像分类方法、装置、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001027865A1 (en) * 1999-10-08 2001-04-19 British Telecommunications Public Limited Company Cartoon recognition
US20030012447A1 (en) * 2000-03-02 2003-01-16 Mark Pawlewski Cartoon recognition
CN1498391A (zh) * 2001-07-20 2004-05-19 �ʼҷ����ֵ������޹�˾ 检测视频数据流中卡通的方法和系统
CN1679027A (zh) * 2002-08-26 2005-10-05 皇家飞利浦电子股份有限公司 用于检测视频图像序列中内容属性的设备和方法
CN101276417A (zh) * 2008-04-17 2008-10-01 上海交通大学 基于内容的互联网动画媒体垃圾信息过滤方法
CN105844251A (zh) * 2016-03-31 2016-08-10 乐视控股(北京)有限公司 一种卡通视频识别方法及装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6363380B1 (en) * 1998-01-13 2002-03-26 U.S. Philips Corporation Multimedia computer system with story segmentation capability and operating program therefor including finite automation video parser
CN101650728A (zh) * 2009-08-26 2010-02-17 北京邮电大学 视频高层特征检索系统及其实现
CN101977311B (zh) * 2010-11-03 2012-07-04 上海交通大学 基于多特征分析的cg动画视频检测方法
CN104881675A (zh) * 2015-05-04 2015-09-02 北京奇艺世纪科技有限公司 一种视频场景的识别方法和装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001027865A1 (en) * 1999-10-08 2001-04-19 British Telecommunications Public Limited Company Cartoon recognition
US20030012447A1 (en) * 2000-03-02 2003-01-16 Mark Pawlewski Cartoon recognition
CN1498391A (zh) * 2001-07-20 2004-05-19 �ʼҷ����ֵ������޹�˾ 检测视频数据流中卡通的方法和系统
CN1679027A (zh) * 2002-08-26 2005-10-05 皇家飞利浦电子股份有限公司 用于检测视频图像序列中内容属性的设备和方法
CN101276417A (zh) * 2008-04-17 2008-10-01 上海交通大学 基于内容的互联网动画媒体垃圾信息过滤方法
CN105844251A (zh) * 2016-03-31 2016-08-10 乐视控股(北京)有限公司 一种卡通视频识别方法及装置

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109871827A (zh) * 2019-03-14 2019-06-11 大连海事大学 一种结合区域置信度和压力分布方向强度的足迹表达方法
CN109871827B (zh) * 2019-03-14 2022-10-25 大连海事大学 一种结合区域置信度和压力分布方向强度的足迹表达方法
CN111325181A (zh) * 2020-03-19 2020-06-23 北京海益同展信息科技有限公司 一种状态监测方法、装置、电子设备及存储介质
CN111325181B (zh) * 2020-03-19 2023-12-05 京东科技信息技术有限公司 一种状态监测方法、装置、电子设备及存储介质
CN111479130A (zh) * 2020-04-02 2020-07-31 腾讯科技(深圳)有限公司 一种视频定位方法、装置、电子设备和存储介质
CN111479130B (zh) * 2020-04-02 2023-09-26 腾讯科技(深圳)有限公司 一种视频定位方法、装置、电子设备和存储介质
CN115544473A (zh) * 2022-09-09 2022-12-30 苏州吉弘能源科技有限公司 一种光伏发电站运维终端登录控制系统
CN115544473B (zh) * 2022-09-09 2023-11-21 苏州吉弘能源科技有限公司 一种光伏发电站运维终端登录控制系统

Also Published As

Publication number Publication date
CN105844251A (zh) 2016-08-10

Similar Documents

Publication Publication Date Title
US10896349B2 (en) Text detection method and apparatus, and storage medium
WO2017166597A1 (zh) 一种卡通视频识别方法、装置和电子设备
US11062123B2 (en) Method, terminal, and storage medium for tracking facial critical area
CN109151501B (zh) 一种视频关键帧提取方法、装置、终端设备及存储介质
US8358837B2 (en) Apparatus and methods for detecting adult videos
JP6719457B2 (ja) 画像の主要被写体を抽出する方法とシステム
CN104866616B (zh) 监控视频目标搜索方法
CN107944427B (zh) 动态人脸识别方法及计算机可读存储介质
US8630454B1 (en) Method and system for motion detection in an image
Yan et al. One extended OTSU flame image recognition method using RGBL and stripe segmentation
CN106683073B (zh) 一种车牌的检测方法及摄像机和服务器
CN104778481A (zh) 一种大规模人脸模式分析样本库的构建方法和装置
CN107038416B (zh) 一种基于二值图像改进型hog特征的行人检测方法
CN111144366A (zh) 一种基于联合人脸质量评估的陌生人脸聚类方法
Seo et al. Effective and efficient human action recognition using dynamic frame skipping and trajectory rejection
US20180189557A1 (en) Human detection in high density crowds
CN111383244A (zh) 一种目标检测跟踪方法
Zhu et al. Detecting natural scenes text via auto image partition, two-stage grouping and two-layer classification
CN111582654B (zh) 基于深度循环神经网络的服务质量评价方法及其装置
CN111160107A (zh) 一种基于特征匹配的动态区域检测方法
CN108009480A (zh) 一种基于特征识别的图像人体行为检测方法
CN102129569A (zh) 基于多尺度对比特征的对象检测设备和方法
CN107341456B (zh) 一种基于单幅户外彩色图像的天气晴阴分类方法
CN114926761B (zh) 一种基于时空平滑特征网络的动作识别方法
Cheng et al. Research on Fast Target Detection And Classification Algorithm for Passive Millimeter Wave Imaging

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16896369

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 16896369

Country of ref document: EP

Kind code of ref document: A1