CN105389558A - Method and apparatus for detecting video - Google Patents

Method and apparatus for detecting video Download PDF

Info

Publication number
CN105389558A
CN105389558A CN201510764366.0A CN201510764366A CN105389558A CN 105389558 A CN105389558 A CN 105389558A CN 201510764366 A CN201510764366 A CN 201510764366A CN 105389558 A CN105389558 A CN 105389558A
Authority
CN
China
Prior art keywords
video
subsegment
text
testing result
bad
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510764366.0A
Other languages
Chinese (zh)
Inventor
李邵梅
黄海
于洪涛
王凯
高超
黄雅静
李印海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PLA Information Engineering University
Original Assignee
PLA Information Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PLA Information Engineering University filed Critical PLA Information Engineering University
Priority to CN201510764366.0A priority Critical patent/CN105389558A/en
Publication of CN105389558A publication Critical patent/CN105389558A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/48Matching video sequences

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The present invention provides a method and an apparatus for detecting a video. A video to be detected is segmented into a plurality of sub-section videos based on similarity of adjacent frame images in the video to be detected, then image detection, text detection and voice keyword detection are performed for each sub-section video, so that a detection result of the video to be detected can be determined based on an image detection result of each sub-section video obtained through the image detection, based on a text detection result of each sub-section video obtained through the text detection, and based on a voice detection result obtained through the voice keyword detection, i.e., whether the video to be detected is a bad video is determined. According to the above technical scheme, the method and the apparatus of the present invention determine whether the video to be detected is a bad video based on image, text and voice. In comparison with the simple image detection in the prior art, the method and the apparatus detect the video to be detected from multiple aspects, thereby analyzing the video to be detected more comprehensively and improving accuracy of video detection.

Description

A kind of video detecting method and device
Technical field
The invention belongs to image identification technical field, in particular, particularly relate to a kind of video detecting method and device.
Background technology
Bad video refers to there is video that is illegal or unlawful practice in mode of propagation or content.Current bad video mainly contains two types: the bad video of pirate video and other types, and wherein the bad video of other types mainly comprises: reaction video, cruelly probably video, swindle video and pornographic video.These bad videos, by public network wide-scale distribution, have become the major incentive of social danger.
In order to purify Internet environment, researchist proposes the multiple method detected bad video.Detect delay wherein for pirate video is comparatively ripe, and be content-based detection method for main detection method reaction video, cruelly probably video, swindle video and this badness video of pornographic video, its processing procedure is as follows:
First obtain the visual object in bad video, and the eigenwert extracting described visual object is as matching template; Secondly, need the video of coupling acquisition one after, subregion is carried out to the every two field picture in described video, and extracts the eigenwert of each subregion district by district; Then the eigenwert of each subregion and the above-mentioned eigenwert as matching template are carried out the Similarity Measure based on distance, similarity is less than specifies threshold value then to judge that video is as bad video.But video is the set of image, text and a speech, is detected by image merely and determine whether video is that bad video may cause video detection inaccurate.
Summary of the invention
In view of this, the object of the present invention is to provide a kind of video detecting method and device, for improving the accuracy that video detects.
The invention provides a kind of video detecting method, described method comprises:
Based on the similarity of consecutive frame image in video to be detected, described Video segmentation to be detected is become multiple subsegment video;
Carry out image detection, text detection and voice key word to each subsegment video respectively to detect, obtain the text hegemony result of the image testing result of each subsegment video, the text detection result of each subsegment video and each subsegment video, wherein said image testing result is used to indicate the testing result detecting the subsegment video obtained based on image, described text detection result is used to indicate the testing result of the subsegment video obtained based on text detection, and described text hegemony result is used to indicate the testing result detecting the subsegment video obtained based on voice key word;
Based on the text hegemony result of the image testing result of each subsegment video, the text detection result of each subsegment video and each subsegment video, obtain the testing result of corresponding subsegment video;
Based on the testing result of each subsegment video, obtain the testing result of described video to be detected.
Preferably, the described text hegemony result based on the image testing result of each subsegment video, the text detection result of each subsegment video and each subsegment video, obtains the testing result of corresponding subsegment video, comprising:
In the text hegemony result of the image testing result of subsegment video, the text detection result of subsegment video and subsegment video, the instruction of any one testing result detects destination object, and the grade of destination object is when being one-level, obtain indicating described subsegment video to be the testing result of bad video subsegment;
In the text hegemony result of the image testing result of subsegment video, the text detection result of subsegment video and subsegment video, at least two testing result instructions detect destination object, and the grade of destination object is when being secondary, obtain indicating described subsegment video to be the testing result of bad video subsegment, the significance level of wherein said secondary is less than the significance level of described one-level;
In the text hegemony result of the image testing result of subsegment video, the text detection result of subsegment video and subsegment video, the instruction of any one testing result detects destination object, and the grade of destination object is when being secondary, obtain indicating described subsegment video to be the testing result of doubtful bad video subsegment.
Preferably, the described testing result based on each subsegment video, obtains the testing result of described video to be detected, comprising:
Based on described testing result, obtain the second subsegment number of videos of the first subsegment number of videos into bad video subsegment and doubtful bad video field;
When the ratio of described first subsegment number of videos and subsegment video sum is greater than first threshold, obtain indicating described video to be detected to be the testing result of bad video;
When the ratio of described second subsegment number of videos and described subsegment video sum is greater than Second Threshold, obtain indicating described video to be detected to be the testing result of bad video, wherein said first threshold is less than Second Threshold.
Preferably, image detection is carried out to subsegment video, obtains the image testing result of subsegment video, comprising:
Extract the visual signature of the surveyed area of every two field picture in described subsegment video;
Extracted visual signature and the image object model set up in advance are carried out the matching analysis, to obtain the grade of bad object in described every two field picture and described bad object, wherein said image testing result comprises the grade of bad object in described every two field picture and described bad object.
Preferably, text detection is carried out to subsegment video, obtains the text detection result of subsegment video, comprising:
That determines in described subsegment video in every two field picture is text filed;
Text filedly carry out text identification to determined, obtain the described text filed text comprised;
The text obtained is mated with the text library set up in advance, to obtain the grade of bad text in described every two field picture and described bad text, wherein said text detection result comprises the grade of bad text in described every two field picture and described bad text.
Preferably, text hegemony is carried out to subsegment video, obtains the text hegemony result of subsegment video, comprising:
Extract the voice data in described subsegment video, and obtain the phonetic feature sequence of described voice data;
Obtained phonetic feature sequence and the phonetic feature sequence of each keyword in the speech storehouse set up in advance are compared, obtains the distance between obtained phonetic feature sequence and the phonetic feature sequence of each keyword;
When the value of the distance between obtained phonetic feature sequence and the phonetic feature sequence of any one keyword is less than distance threshold, determine that described subsegment video comprises bad speech;
The value obtaining distance is less than the described keyword of distance threshold, and determines the grade of bad speech based on described keyword place grade, and described text hegemony result comprises the grade of described bad speech and described bad speech.
The present invention also provides a kind of video detecting device, and described device comprises:
Cutting unit, for the similarity based on consecutive frame image in video to be detected, becomes multiple subsegment video by described Video segmentation to be detected;
Detecting unit, for carrying out image detection to each subsegment video respectively, text detection and voice key word detect, obtain the image testing result of each subsegment video, the text detection result of each subsegment video and the text hegemony result of each subsegment video, wherein said image testing result is used to indicate the testing result detecting the subsegment video obtained based on image, described text detection result is used to indicate the testing result of the subsegment video obtained based on text detection, described text hegemony result is used to indicate the testing result detecting the subsegment video obtained based on voice key word,
First processing unit, for the text hegemony result based on the image testing result of each subsegment video, the text detection result of each subsegment video and each subsegment video, obtains the testing result of corresponding subsegment video;
Second processing unit, for the testing result based on each subsegment video, obtains the testing result of described video to be detected.
Preferably, described first processing unit is used for: when the image testing result of subsegment video, in the text detection result of subsegment video and the text hegemony result of subsegment video, the instruction of any one testing result detects destination object, and the grade of destination object is when being one-level, obtain indicating described subsegment video to be the testing result of bad video subsegment, and for the image testing result when subsegment video, in the text detection result of subsegment video and the text hegemony result of subsegment video, at least two testing results instructions detect destination object, and the grade of destination object is when being secondary, obtain indicating described subsegment video to be the testing result of bad video subsegment, the significance level of wherein said secondary is less than the significance level of described one-level, and for the image testing result when subsegment video, in the text detection result of subsegment video and the text hegemony result of subsegment video, the instruction of any one testing result detects destination object, and the grade of destination object is when being secondary, obtain indicating described subsegment video to be the testing result of doubtful bad video subsegment.
Preferably, described second processing unit comprises: obtain subelement and process subelement;
Described acquisition subelement, for based on described testing result, obtains the second subsegment number of videos of the first subsegment number of videos into bad video subsegment and doubtful bad video field;
Described process subelement, for when the ratio of described first subsegment number of videos and subsegment video sum is greater than first threshold, obtain the testing result that the described video to be detected of instruction is bad video, and when the ratio of described second subsegment number of videos and described subsegment video sum is greater than Second Threshold, obtain the testing result that the described video to be detected of instruction is bad video, wherein said first threshold is less than Second Threshold.
Preferably, described detecting unit comprises: image detection sub-unit, text detection subelement and text hegemony subelement;
Described image detection sub-unit, for extracting the visual signature of the surveyed area of every two field picture in described subsegment video, extracted visual signature and the image object model set up in advance are carried out the matching analysis, to obtain the grade of bad object in described every two field picture and described bad object, wherein said image testing result comprises the grade of bad object in described every two field picture and described bad object;
Described text detection subelement, text filed for what determine in described subsegment video in every two field picture, text filedly text identification is carried out to determined, obtain the described text filed text comprised, and the text obtained is mated with the text library set up in advance, to obtain the grade of bad text in described every two field picture and described bad text, wherein said text detection result comprises the grade of bad text in described every two field picture and described bad text;
Described text hegemony subelement, for extracting the voice data in described subsegment video, obtain the phonetic feature sequence of described voice data, obtained phonetic feature sequence and the phonetic feature sequence of each keyword in the speech storehouse set up in advance are compared, obtains the distance between obtained phonetic feature sequence and the phonetic feature sequence of each keyword; When the value of the distance between obtained phonetic feature sequence and the phonetic feature sequence of any one keyword is less than distance threshold, determine that described subsegment video comprises bad speech; The value obtaining distance is less than the described keyword of distance threshold, and determines the grade of bad speech based on described keyword place grade, and described text hegemony result comprises the grade of described bad speech and described bad speech.
Compared with prior art, technique scheme tool provided by the invention has the following advantages:
Technique scheme provided by the invention, can based on the similarity of consecutive frame image in video to be detected, Video segmentation to be detected is become multiple subsegment video, then image detection is carried out to each subsegment video, text detection and voice key word detect, the image testing result of each subsegment video obtained so just can be detected based on image, the text detection result of each subsegment video obtained based on text detection and detect the testing result that the text hegemony result obtained judges video to be detected based on voice key word, namely video to be detected is judged whether as bad video.That is the present invention is judging that whether video to be detected is based in image, text and voice key word these three as bad video, compared with detecting with image simple in prior art, the present invention detects from many aspects video to be detected, thus more comprehensive video to be detected to be analyzed, improve the accuracy that video detects.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
Fig. 1 is the process flow diagram of the video detecting method that the embodiment of the present invention provides;
Fig. 2 is the light stream trajectory diagram that the embodiment of the present invention provides;
Fig. 3 is schematic diagram subsegment video being carried out to image detection that the embodiment of the present invention provides;
Fig. 4 is the unit surveyed area of the destination object that the embodiment of the present invention provides;
Fig. 5 is the histogram of gradients that the embodiment of the present invention provides;
Fig. 6 is schematic diagram subsegment video being carried out to text detection that the embodiment of the present invention provides;
Fig. 7 is schematic diagram subsegment video being carried out to text hegemony that the embodiment of the present invention provides;
Fig. 8 is the sequence alignment schematic diagram based on DTW that the embodiment of the present invention provides;
Fig. 9 is the structural representation of the video detecting device that the embodiment of the present invention provides;
Figure 10 is the structural representation of detecting unit in the video detecting device that provides of the embodiment of the present invention.
Embodiment
One of core concept of the video detecting method that the embodiment of the present invention provides and device is: by carrying out image detection to each subsegment video of video to be detected, text detection and voice key word detect and judge video to be detected whether as bad video, detect relative to image simple in prior art like this and compare, the video detecting method that the embodiment of the present invention provides and device can detect from many aspects video to be detected, thus more comprehensive video to be detected to be analyzed, improve the accuracy that video detects.
For making the object of the embodiment of the present invention, technical scheme and advantage clearly, below in conjunction with the accompanying drawing in the embodiment of the present invention, technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
Refer to Fig. 1, a kind of process flow diagram of the video detecting method that the embodiment of the present invention provides, can comprise the following steps:
101: based on the similarity of consecutive frame image in video to be detected, Video segmentation to be detected is become multiple subsegment video.Whether wherein in video to be detected, the similarity of consecutive frame image refers to the similarity degree of consecutive frame image, can determine between consecutive frame image similar by light stream track in embodiments of the present invention.Its process is as follows:
First, for each pixel in every two field picture arranges a light stream vector, optical flow computation method is adopted to extract the movement locus of each pixel between consecutive frame image based on light stream vector, and namely movement locus two-dimensional coordinate figure drawing each pixel is obtained light stream trajectory diagram, as shown in the 3rd accompanying drawing in Fig. 2, wherein in Fig. 2, the 3rd accompanying drawing is the light stream trajectory diagram that front two image respective pixel value comparing calculations obtain, and front two images are front and back two two field pictures extracted from same section of video; Then add up movement velocity on light stream trajectory diagram and exceed the number of the pixel of certain movement threshold speed, and when the number of pixel and the ratio of sum of all pixels that exceed certain movement threshold speed are greater than presetted pixel threshold value, judge that this two two field picture is as similar image, can be so just partitioning boundary by this two two field picture, video to be detected then can carry out segmentation with this partitioning boundary and obtain subsegment video.
Such as when two two field pictures of similar image are the 3rd two field picture and the 4th two field picture, when then splitting using this two two field picture as partitioning boundary, can by before the 3rd two field picture and the 3rd two field picture, namely the 1st two field picture and the 2nd two field picture are segmented in same subsegment video, and the Iamge Segmentation after the 4th two field picture and the 4th two field picture is in another subsegment video, if still there is similar image in the image after the 4th two field picture and the 4th two field picture, then can split further subsegment video and obtain multiple subsegment video.
In embodiments of the present invention, the above-mentioned optical flow computation method adopted based on light stream vector can be any one method in existing optical flow computation, as LK (Lucas-Kanade) algorithm, and above-mentioned movement velocity threshold value and presetted pixel threshold value can set according to actual conditions, do not limit its concrete value to this embodiment of the present invention.
102: respectively image detection, text detection and voice key word are carried out to each subsegment video and detect, obtain the text hegemony result of the image testing result of each subsegment video, the text detection result of each subsegment video and each subsegment video, wherein image testing result is used to indicate the testing result detecting the subsegment video obtained based on image, text detection result is used to indicate the testing result of the subsegment video obtained based on text detection, and text hegemony result is used to indicate the testing result detecting the subsegment video obtained based on voice key word.
Namely above-mentioned image testing result, text detection result and text hegemony result may be used for indicating in corresponding subsegment video whether comprise destination object.For image testing result, comprise: the portrait of terrorist head and the badge of terrorist if obtain subsegment video based on image detection, then image testing result instruction subsegment video comprises destination object.
103: based on the text hegemony result of the image testing result of each subsegment video, the text detection result of each subsegment video and each subsegment video, obtain the testing result of corresponding subsegment video.Because the text detection result of the image testing result of subsegment video, subsegment video and the text hegemony result of subsegment video are detect to an aspect of subsegment video the testing result obtained respectively, whether it can not indicate subsegment video to be bad video field completely, so need in the embodiment of the present invention testing result obtaining corresponding subsegment video based on these three testing results.
In embodiments of the present invention, a kind of mode obtaining the testing result of corresponding subsegment video is: in the text hegemony result of the image testing result of subsegment video, the text detection result of subsegment video and subsegment video, the instruction of any one testing result detects destination object, and the grade of destination object is when being one-level, obtain indicating subsegment video to be the testing result of bad video subsegment;
In the text hegemony result of the image testing result of subsegment video, the text detection result of subsegment video and subsegment video, at least two testing result instructions detect destination object, and the grade of destination object is when being secondary, obtain indicating subsegment video to be the testing result of bad video subsegment;
In the text hegemony result of the image testing result of subsegment video, the text detection result of subsegment video and subsegment video, the instruction of any one testing result detects destination object, and the grade of destination object is when being secondary, obtain indicating subsegment video to be the testing result of doubtful bad video subsegment;
Except above-mentioned image testing result, text detection result and text hegemony result can indicate subsegment video to be except the testing result of bad video field or doubtful bad video field, other situations then can obtain the testing result that instruction subsegment video is normal video field.
In embodiments of the present invention, destination object is the flame comprised in subsegment video, as in image testing result, destination object can be the nude etc. in the portrait of terrorist head, the badge of terrorist and pornographic video; In text detection result, destination object can be " once effects a radical cure and do not recur " of cruelly fearing in " crusade " in video, " the pornographic select-elite " in pornographic video, swindle advertisement video etc.; And destination object can as " cure rate 100% " in " massacre and commit suiside ", swindle advertisement video in cruelly probably video etc. in text hegemony result.And the significance level of the grade indicating target object of destination object in embodiments of the present invention, namely the higher explanation destination object of significance level may be more flame, and in embodiments of the present invention, the significance level of secondary is less than the significance level of one-level.
104: based on the testing result of each subsegment video, obtain the testing result of video to be detected.Its feasible pattern is: based on testing result, obtains the second subsegment number of videos of the first subsegment number of videos into bad video subsegment and doubtful bad video field; When the ratio of the first subsegment number of videos and subsegment video sum is greater than first threshold, obtain indicating video to be detected to be the testing result of bad video; When the ratio of the second subsegment number of videos and subsegment video sum is greater than Second Threshold, obtain indicating video to be detected to be the testing result of bad video, wherein first threshold is less than Second Threshold.
Such as first threshold is 60%, and Second Threshold is 80%.Here it should be noted that: 60% and 80% only illustrates, first threshold and Second Threshold can set different values in varied situations.
From technique scheme, the video detecting method that the embodiment of the present invention provides can based on the similarity of consecutive frame image in video to be detected, Video segmentation to be detected is become multiple subsegment video, then image detection is carried out to each subsegment video, text detection and voice key word detect, the image testing result of each subsegment video obtained so just can be detected based on image, the text detection result of each subsegment video obtained based on text detection and detect the testing result that the text hegemony result obtained judges video to be detected based on voice key word, namely video to be detected is judged whether as bad video.That is the present invention is judging that whether video to be detected is based in image, text and voice key word these three as bad video, compared with detecting with image simple in prior art, the present invention detects from many aspects video to be detected, thus more comprehensive video to be detected to be analyzed, improve the accuracy that video detects.
Then introduce the feasible pattern that the embodiment of the present invention carries out image detection to subsegment video, text detection and voice key word detect below in detail, as shown in Figure 3, it illustrates and image detection is carried out to subsegment video, obtain the image testing result of subsegment video, can comprise the following steps:
1021: the visual signature extracting the surveyed area of every two field picture in subsegment video.The subregion of every two field picture that refers to of the surveyed area of every two field picture in embodiments of the present invention, it can adopt existing conspicuousness detection method to locate the surveyed area obtaining object place to be detected in every two field picture, namely adopt existing conspicuousness detection method may for the object to be detected of destination object to orient in every two field picture, then at the visual signature extracting its place surveyed area, or sliding window method region-by-region on whole two field picture is adopted to extract the visual signature of each surveyed area.
Wherein visual signature can adopt HOG (HistogramofOrientedGradient, histograms of oriented gradients) feature, its leaching process is: for each surveyed area by the horizontal direction gradient of pixel extraction pixel and vertical gradient, horizontal direction gradient is G h(x, y)=f (x+1, y)-f (x-1, y), vertical gradient is G v(x, y)=f (x, y+1)-f (x, y-1), f (x, y) are the pixel values at (x, y) place;
Then gradient magnitude M (x, y) and the gradient direction θ (x, y) of pixel is calculated based on above-mentioned horizontal direction gradient and vertical gradient:
M ( x , y ) = G h ( x , y ) 2 + G v ( x , y ) 2
θ (x, y)=arctan (G h(x, y)/G v(x, y)), wherein gradient direction is limited to (0 ~ 180 °),
&theta; ( x , y ) = &theta; ( x , y ) + &pi; , &theta; ( x , y ) < 0 &theta; ( x , y ) , o t h e r s .
1022: extracted visual signature and the image object model set up in advance are carried out the matching analysis, to obtain the grade of bad object in every two field picture and bad object, wherein image testing result comprises the grade of bad object in every two field picture and bad object.
In embodiments of the present invention, the image object model set up in advance can be a SVM (SupportVectorMachine, support vector machine) model, after extracting visual signature, can be substituted in the optimal classification function of SVM model, as in, wherein, sgn () is-symbol function, x tfor the visual signature extracted, if g is (x t)=1, then judge that surveyed area comprises bad object, otherwise do not comprise.After obtaining bad object, each destination object in itself and bad video object library of object is compared, determines its grade.Wherein bad video object library of object is the library of object obtained according to existing bad video, it stores the plurality of target formation being identified as bad object in existing bad video, and according to the significance level of destination object, multiple destination object is divided into two-stage, such as one-level is destination object exclusive in bad video, as cruelly feared portrait, the badge of the terrorist head in video; Nude in pornographic video; Secondary is that in multiple bad video, occurrence frequency is high, but the destination object that also may occur in other videos, as cruelly feared the explosive etc. in video.
Wherein SVM model can carry out modeling based on HOG feature, its process of establishing is: collect the tape identification image that some comprises above-mentioned destination object, extract the HOG feature of destination object region, be labeled as forward sample set, and the same image collected some and do not comprise above-mentioned destination object, extract the HOG feature of arbitrary region, be labeled as negative sense sample set; Above-mentioned forward sample set and negative sense sample set are sent into SVM model, and training obtains the following objective function of model: wherein, N is the number of all samples, i.e. the sum of forward sample and negative sense sample, x ithe HOG feature of each sample above-mentioned, y ibeing the label of sample, how to be forward sample is exactly+1, otherwise is-1; By solving above-mentioned the minimization of object function, obtain the correlation parameter of SVM model: w, a and b; Respective SVM model is obtained, the bad object video model bank of common formation to each destination object training in bad object video storehouse.First the HOG feature of each destination object in bad video object library of object is extracted; Then based on the HOG feature of each destination object, set up the SVM model of each destination object respectively, the SVM model wherein set up is the optimal classification function be made up of multiple parameter (w, a, b).
Wherein the process of each destination object extraction HOG feature is: each destination object region above-mentioned is normalized to 224 × 224, be that unit is divided into Cell (unit) according to 8 × 8 pixels, every 4 Cell form 1 Block (block), as shown in Figure 4, each like this destination object region is divided into 49 Block;
For each Cell, obtain gradient magnitude M (x, y) and the gradient direction θ (x, y) of each pixel in each Cell, and add up gradient magnitude and the direction of each pixel in each Cell, form histogram of gradients, as shown in Figure 5; The histogram of gradients feature of the Cell of 4 in each Block is connected, forms the feature of 4 × 9=36 dimension; Finally by the HOG feature of the histogram of gradients feature of all Block series connection formation 36 × 49=1746 dimension, i.e. the HOG feature of destination object.The amplitude that wherein in Fig. 5, transverse axis represents obtains according to calculated for pixel values, and the span of pixel value is [0,255], without unit.
Here it should be noted is that: the embodiment of the present invention is only illustrate with HOG characteristic sum SVM model, in actual application, above-mentioned visual signature and image object model can also adopt other modes, illustrate no longer one by one this embodiment of the present invention.
Refer to Fig. 6, it illustrates in the video detecting method that the embodiment of the present invention provides and text detection is carried out to subsegment video, obtain the process of the text detection result of subsegment video, can comprise the following steps:
1023: that determines in subsegment video in every two field picture is text filed.Wherein text filed is the region that may comprise text in every two field picture, the region containing captions or scene text can be locked by the MSER (MaximallyStableExtremalRegion, maximum stable extremal region) in the every two field picture of detection in the embodiment of the present invention; Wherein the testing process of MSER is: use multiple gray threshold to carry out binary conversion treatment to every two field picture, obtain the bianry image corresponding with each gray threshold; For the bianry image that each gray threshold obtains, obtain the black region in each bianry image and white portion; When all comprising the similar region of shape in the bianry image that multiple continuous print gray threshold is corresponding, be that a shape preserves stable region depending on the region that this shape is similar, the region of this dimensionally stable is exactly MSER.
1024: text filedly carry out text identification to determined, obtain the text filed text comprised.In embodiments of the present invention, OCR (OpticalCharacterRecognition can be adopted, optical character identification) technology identifies text filed, to during text filed identification in employing OCR technology text filedly can carry out enhancings and process above-mentioned further, OCR technology more be had text is identified.
1025: mated with the text library set up in advance by the text obtained, to obtain the grade of bad text in every two field picture and bad text, wherein text detection result comprises the grade of bad text in every two field picture and bad text.
The text library wherein set up in advance builds according to the sensitive word in the typical scene text comprised in existing bad video and captions, and according to the significance level of typical scene text and sensitive word, be divided into two-stage, one-level is exclusive text in bad video, as cruelly feared " crusade " in video, " the pornographic select-elite " in pornographic video, " once effect a radical cure and do not recur " of swindling in advertisement video etc.; Secondary is that in bad video, occurrence frequency is high, but the text that also may occur in other videos, as cruelly feared " Koran " in video, " temptation " in pornographic video, " the invalid reimbursement " swindled in advertisement video etc.Like this after determining that image comprises bad text, bad text is compared with the texts at different levels in the text library set up in advance and can obtain the grade of bad text.
Refer to Fig. 7, it illustrates in the video detecting method that the embodiment of the present invention provides and text hegemony is carried out to subsegment video, obtain the text hegemony result of subsegment video, can comprise the following steps:
1026: extract the voice data in subsegment video, and obtain the phonetic feature sequence of voice data.Wherein phonetic feature sequence can be MFCC (Mel-frequencyCepstralCoefficient, Mel frequency cepstrum coefficient) characteristic sequence, its leaching process is: carry out framing with certain hour interval to voice data, obtains multiframe speech data; FFT (FastFrequencyTransformation, fast fourier transform) computing is carried out to every frame speech data, and the result of FFT computing is sent into the Mel bank of filters divided in advance, obtain the output of each wave filter; Get the logarithm of wave filter output and carry out the 12 dimension MFCC features that namely DCT (DiscreteCosineTransform, discrete cosine transform) conversion obtains speech data.
In embodiments of the present invention, the time interval is a default time data, and it can set at random according to practical situations, does not limit this embodiment of the present invention.Mel bank of filters is the array of the up-and-down boundary composition of a series of division Mel frequency range, it is made up of the V-belt bandpass filter of given number, the centre frequency of V-belt bandpass filter and bandwidth evenly distributed in the Mel scale frequency that [0-4000] Hz scope is corresponding.Wherein Mel frequency puts forward based on the auditory properties of people's ear, and it becomes nonlinear correspondence relation with Hz frequency, and conversion formula is: Mel (f)=2595log 10(1+f/700).
1027: obtained phonetic feature sequence and the phonetic feature sequence of each keyword in the speech storehouse set up in advance are compared, obtains the distance between obtained phonetic feature sequence and the phonetic feature sequence of each keyword.
In embodiments of the present invention, in the speech storehouse set up in advance, each keyword root obtains according to the representative key word comprised in existing bad video, and according to the significance level of these keywords, is divided into two-stage.One-level is keyword exclusive in bad video, as cruelly feared " massacre and commit suiside " in video, " cure rate 100% " in swindle advertisement video etc.; Secondary is that in bad video, occurrence frequency is high, but the keyword that also may occur in other videos, as cruelly feared " the getting into heaven " in video, " a large amount of clinical verification " in swindle advertisement video etc.
For these keywords in speech storehouse, its phonetic feature sequence also can be MFCC characteristic sequence, and leaching process can consult the explanation in above-mentioned steps 1026, no longer sets forth this embodiment of the present invention.
1028: when the value of the distance between obtained phonetic feature sequence and the phonetic feature sequence of any one keyword is less than distance threshold, determine that subsegment video comprises bad speech.
The phonetic feature sequence of sliding window method to obtained phonetic feature sequence and each keyword can being adopted obtained phonetic feature sequence to carry out DTW (DyanamicTimeWarping with when the phonetic feature sequence of each keyword is compared in the speech storehouse to set up in advance, dynamic time warping), wherein DTW be a kind of classics sequence between comparative approach, by calculating the distance in two sequences between correspondence position element, then cumulative output is carried out, what it exported is distance between two sequences, when distance more hour, similarity between two sequences is larger, when the value of distance is less than distance threshold, determine that subsegment video comprises bad speech.
Distance wherein between two sequences exporting of DTW is the minimum value of two sequence spacings, as shown in Figure 8 based on the characteristic sequence comparison schematic diagram of DTW, wherein transverse axis and the longitudinal axis are two characteristic sequences compared respectively, and length is respectively M and N, mark { 1 in coordinate axis, 2 ..., M} and { 1,2, .., N} represents the label of each eigenwert in two characteristic sequences, and the diamond-shaped area in figure is for retraining the path of carrying out distance and calculating.Calculate in diamond-shaped area all from initial point (0,0) to impact point (M, N) distance in path, distance between two characteristic sequence corresponding point pair of i.e. path process, the point as curved path in figure identified calculating path process is right, finds each point to the position in two corresponding respectively characteristic sequences successively, get the eigenwert of correspondence position, then calculate the Euclidean distance between eigenwert, as shown in the first two dotted line frame in Fig. 8, be namely used to indicate the distance of how calculating path.Finally the Euclidean distance between all-pair is compared, therefrom get minimum value just as the distance between two characteristic sequences.
1029: the value obtaining distance is less than the keyword of distance threshold, and determines the grade of bad speech based on keyword place grade, and text hegemony result comprises the grade of bad speech and bad speech.
For aforesaid each embodiment of the method, in order to simple description, therefore it is all expressed as a series of combination of actions, but those skilled in the art should know, the present invention is not by the restriction of described sequence of movement, because according to the present invention, some step can adopt other orders or carry out simultaneously.Secondly, those skilled in the art also should know, the embodiment described in instructions all belongs to preferred embodiment, and involved action and module might not be that the present invention is necessary.
Corresponding with said method embodiment, the embodiment of the present invention also provides a kind of video detecting device, and its structural representation as shown in Figure 9, can comprise: cutting unit 11, detecting unit 12, first processing unit and the second processing unit 14.
Cutting unit 11, for the similarity based on consecutive frame image in video to be detected, becomes multiple subsegment video by Video segmentation to be detected.Wherein in video to be detected, the similarity of consecutive frame image refers to the similarity degree of consecutive frame image, whether similarly can determine between consecutive frame image by light stream track in embodiments of the present invention, detailed process can consult the related description in said method embodiment, no longer sets forth this embodiment of the present invention.
Detecting unit 12, detect for carrying out image detection, text detection and voice key word to each subsegment video respectively, obtain the text hegemony result of the image testing result of each subsegment video, the text detection result of each subsegment video and each subsegment video, wherein image testing result is used to indicate the testing result detecting the subsegment video obtained based on image, text detection result is used to indicate the testing result of the subsegment video obtained based on text detection, and text hegemony result is used to indicate the testing result detecting the subsegment video obtained based on voice key word.
Namely above-mentioned image testing result, text detection result and text hegemony result may be used for indicating in corresponding subsegment video whether comprise destination object.For image testing result, comprise: the portrait of terrorist head and the badge of terrorist if obtain subsegment video based on image detection, then image testing result instruction subsegment video comprises destination object.
First processing unit 13, for the text hegemony result based on the image testing result of each subsegment video, the text detection result of each subsegment video and each subsegment video, obtains the testing result of corresponding subsegment video.Because the text detection result of the image testing result of subsegment video, subsegment video and the text hegemony result of subsegment video are detect to an aspect of subsegment video the testing result obtained respectively, whether it can not indicate subsegment video to be bad video field completely, so need in the embodiment of the present invention testing result obtaining corresponding subsegment video based on these three testing results.
In embodiments of the present invention, a kind of mode that first processing unit 13 obtains the testing result of corresponding subsegment video is: in the text hegemony result of the image testing result of subsegment video, the text detection result of subsegment video and subsegment video, the instruction of any one testing result detects destination object, and the grade of destination object is when being one-level, obtain indicating subsegment video to be the testing result of bad video subsegment;
In the text hegemony result of the image testing result of subsegment video, the text detection result of subsegment video and subsegment video, at least two testing result instructions detect destination object, and the grade of destination object is when being secondary, obtain indicating subsegment video to be the testing result of bad video subsegment;
In the text hegemony result of the image testing result of subsegment video, the text detection result of subsegment video and subsegment video, the instruction of any one testing result detects destination object, and the grade of destination object is when being secondary, obtain indicating subsegment video to be the testing result of doubtful bad video subsegment;
Except above-mentioned image testing result, text detection result and text hegemony result can indicate subsegment video to be except the testing result of bad video field or doubtful bad video field, other situations then can obtain the testing result that instruction subsegment video is normal video field.
In embodiments of the present invention, destination object is the flame comprised in subsegment video, as in image testing result, destination object can be the nude etc. in the portrait of terrorist head, the badge of terrorist and pornographic video; In text detection result, destination object can be " once effects a radical cure and do not recur " of cruelly fearing in " crusade " in video, " the pornographic select-elite " in pornographic video, swindle advertisement video etc.; And destination object can as " cure rate 100% " in " massacre and commit suiside ", swindle advertisement video in cruelly probably video etc. in text hegemony result.And the significance level of the grade indicating target object of destination object in embodiments of the present invention, namely the higher explanation destination object of significance level may be more flame, and in embodiments of the present invention, the significance level of secondary is less than the significance level of one-level.
Second processing unit 14, for the testing result based on each subsegment video, obtains the testing result of video to be detected.Concrete, the second processing unit comprises: obtain subelement and process subelement.
Obtain subelement, for based on testing result, obtain the second subsegment number of videos of the first subsegment number of videos into bad video subsegment and doubtful bad video field.
Process subelement, for when the ratio of the first subsegment number of videos and subsegment video sum is greater than first threshold, obtain indicating video to be detected to be the testing result of bad video, and when the ratio of the second subsegment number of videos and subsegment video sum is greater than Second Threshold, obtain indicating video to be detected to be the testing result of bad video, wherein first threshold is less than Second Threshold.Such as first threshold is 60%, and Second Threshold is 80%.Here it should be noted that: 60% and 80% only illustrates, first threshold and Second Threshold can set different values in varied situations.
From technique scheme, the video detecting device that the embodiment of the present invention provides can based on the similarity of consecutive frame image in video to be detected, Video segmentation to be detected is become multiple subsegment video, then image detection is carried out to each subsegment video, text detection and voice key word detect, the image testing result of each subsegment video obtained so just can be detected based on image, the text detection result of each subsegment video obtained based on text detection and detect the testing result that the text hegemony result obtained judges video to be detected based on voice key word, namely video to be detected is judged whether as bad video.That is the present invention is judging that whether video to be detected is based in image, text and voice key word these three as bad video, compared with detecting with image simple in prior art, the present invention detects from many aspects video to be detected, thus more comprehensive video to be detected to be analyzed, improve the accuracy that video detects.
In embodiments of the present invention, the structural representation of above-mentioned detecting unit 12 as shown in Figure 10, can comprise: image detection sub-unit 121, text detection subelement 122 and text hegemony subelement 123.
Image detection sub-unit 121, for extracting the visual signature of the surveyed area of every two field picture in subsegment video, extracted visual signature and the image object model set up in advance are carried out the matching analysis, to obtain the grade of bad object in every two field picture and bad object, wherein image testing result comprises the grade of bad object in every two field picture and bad object.
Text detection subelement 122, text filed for what determine in subsegment video in every two field picture, text filedly text identification is carried out to determined, obtain the text filed text comprised, and the text obtained is mated with the text library set up in advance, to obtain the grade of bad text in every two field picture and bad text, wherein text detection result comprises the grade of bad text in every two field picture and bad text.
Text hegemony subelement 123, for extracting the voice data in subsegment video, obtain the phonetic feature sequence of voice data, obtained phonetic feature sequence and the phonetic feature sequence of each keyword in the speech storehouse set up in advance are compared, obtains the distance between obtained phonetic feature sequence and the phonetic feature sequence of each keyword.When the value of the distance between obtained phonetic feature sequence and the phonetic feature sequence of any one keyword is less than distance threshold, determine that subsegment video comprises bad speech.The value obtaining distance is less than the keyword of distance threshold, and determines the grade of bad speech based on keyword place grade, and text hegemony result comprises the grade of bad speech and bad speech.
The concrete implementation of above-mentioned image detection sub-unit 121, text detection subelement 122 and text hegemony subelement 123 can consult the related description in said method embodiment, no longer sets forth this embodiment of the present invention.
It should be noted that, each embodiment in this instructions all adopts the mode of going forward one by one to describe, and what each embodiment stressed is the difference with other embodiments, between each embodiment identical similar part mutually see.For device class embodiment, due to itself and embodiment of the method basic simlarity, so description is fairly simple, relevant part illustrates see the part of embodiment of the method.
Finally, also it should be noted that, in this article, the such as relational terms of first and second grades and so on is only used for an entity or operation to separate with another entity or operational zone, and not necessarily requires or imply the relation that there is any this reality between these entities or operation or sequentially.And, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thus make to comprise the process of a series of key element, method, article or equipment and not only comprise those key elements, but also comprise other key elements clearly do not listed, or also comprise by the intrinsic key element of this process, method, article or equipment.When not more restrictions, the key element limited by statement " comprising ... ", and be not precluded within process, method, article or the equipment comprising described key element and also there is other identical element.
To the above-mentioned explanation of the disclosed embodiments, those skilled in the art are realized or uses the present invention.To be apparent for a person skilled in the art to the multiple amendment of these embodiments, General Principle as defined herein can without departing from the spirit or scope of the present invention, realize in other embodiments.Therefore, the present invention can not be restricted to these embodiments shown in this article, but will meet the widest scope consistent with principle disclosed herein and features of novelty.
The above is only the preferred embodiment of the present invention; it should be pointed out that for those skilled in the art, under the premise without departing from the principles of the invention; can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.

Claims (10)

1. a video detecting method, is characterized in that, described method comprises:
Based on the similarity of consecutive frame image in video to be detected, described Video segmentation to be detected is become multiple subsegment video;
Carry out image detection, text detection and voice key word to each subsegment video respectively to detect, obtain the text hegemony result of the image testing result of each subsegment video, the text detection result of each subsegment video and each subsegment video, wherein said image testing result is used to indicate the testing result detecting the subsegment video obtained based on image, described text detection result is used to indicate the testing result of the subsegment video obtained based on text detection, and described text hegemony result is used to indicate the testing result detecting the subsegment video obtained based on voice key word;
Based on the text hegemony result of the image testing result of each subsegment video, the text detection result of each subsegment video and each subsegment video, obtain the testing result of corresponding subsegment video;
Based on the testing result of each subsegment video, obtain the testing result of described video to be detected.
2. method according to claim 1, it is characterized in that, the described text hegemony result based on the image testing result of each subsegment video, the text detection result of each subsegment video and each subsegment video, obtains the testing result of corresponding subsegment video, comprising:
In the text hegemony result of the image testing result of subsegment video, the text detection result of subsegment video and subsegment video, the instruction of any one testing result detects destination object, and the grade of destination object is when being one-level, obtain indicating described subsegment video to be the testing result of bad video subsegment;
In the text hegemony result of the image testing result of subsegment video, the text detection result of subsegment video and subsegment video, at least two testing result instructions detect destination object, and the grade of destination object is when being secondary, obtain indicating described subsegment video to be the testing result of bad video subsegment, the significance level of wherein said secondary is less than the significance level of described one-level;
In the text hegemony result of the image testing result of subsegment video, the text detection result of subsegment video and subsegment video, the instruction of any one testing result detects destination object, and the grade of destination object is when being secondary, obtain indicating described subsegment video to be the testing result of doubtful bad video subsegment.
3. method according to claim 2, is characterized in that, the described testing result based on each subsegment video, obtains the testing result of described video to be detected, comprising:
Based on described testing result, obtain the second subsegment number of videos of the first subsegment number of videos into bad video subsegment and doubtful bad video field;
When the ratio of described first subsegment number of videos and subsegment video sum is greater than first threshold, obtain indicating described video to be detected to be the testing result of bad video;
When the ratio of described second subsegment number of videos and described subsegment video sum is greater than Second Threshold, obtain indicating described video to be detected to be the testing result of bad video, wherein said first threshold is less than Second Threshold.
4. method according to claim 1, is characterized in that, carries out image detection to subsegment video, obtains the image testing result of subsegment video, comprising:
Extract the visual signature of the surveyed area of every two field picture in described subsegment video;
Extracted visual signature and the image object model set up in advance are carried out the matching analysis, to obtain the grade of bad object in described every two field picture and described bad object, wherein said image testing result comprises the grade of bad object in described every two field picture and described bad object.
5. method according to claim 1, is characterized in that, carries out text detection to subsegment video, obtains the text detection result of subsegment video, comprising:
That determines in described subsegment video in every two field picture is text filed;
Text filedly carry out text identification to determined, obtain the described text filed text comprised;
The text obtained is mated with the text library set up in advance, to obtain the grade of bad text in described every two field picture and described bad text, wherein said text detection result comprises the grade of bad text in described every two field picture and described bad text.
6. method according to claim 1, is characterized in that, carries out text hegemony to subsegment video, obtains the text hegemony result of subsegment video, comprising:
Extract the voice data in described subsegment video, and obtain the phonetic feature sequence of described voice data;
Obtained phonetic feature sequence and the phonetic feature sequence of each keyword in the speech storehouse set up in advance are compared, obtains the distance between obtained phonetic feature sequence and the phonetic feature sequence of each keyword;
When the value of the distance between obtained phonetic feature sequence and the phonetic feature sequence of any one keyword is less than distance threshold, determine that described subsegment video comprises bad speech;
The value obtaining distance is less than the described keyword of distance threshold, and determines the grade of bad speech based on described keyword place grade, and described text hegemony result comprises the grade of described bad speech and described bad speech.
7. a video detecting device, is characterized in that, described device comprises:
Cutting unit, for the similarity based on consecutive frame image in video to be detected, becomes multiple subsegment video by described Video segmentation to be detected;
Detecting unit, for carrying out image detection to each subsegment video respectively, text detection and voice key word detect, obtain the image testing result of each subsegment video, the text detection result of each subsegment video and the text hegemony result of each subsegment video, wherein said image testing result is used to indicate the testing result detecting the subsegment video obtained based on image, described text detection result is used to indicate the testing result of the subsegment video obtained based on text detection, described text hegemony result is used to indicate the testing result detecting the subsegment video obtained based on voice key word,
First processing unit, for the text hegemony result based on the image testing result of each subsegment video, the text detection result of each subsegment video and each subsegment video, obtains the testing result of corresponding subsegment video;
Second processing unit, for the testing result based on each subsegment video, obtains the testing result of described video to be detected.
8. device according to claim 7, it is characterized in that, described first processing unit is used for: when the image testing result of subsegment video, in the text detection result of subsegment video and the text hegemony result of subsegment video, the instruction of any one testing result detects destination object, and the grade of destination object is when being one-level, obtain indicating described subsegment video to be the testing result of bad video subsegment, and for the image testing result when subsegment video, in the text detection result of subsegment video and the text hegemony result of subsegment video, at least two testing results instructions detect destination object, and the grade of destination object is when being secondary, obtain indicating described subsegment video to be the testing result of bad video subsegment, the significance level of wherein said secondary is less than the significance level of described one-level, and for the image testing result when subsegment video, in the text detection result of subsegment video and the text hegemony result of subsegment video, the instruction of any one testing result detects destination object, and the grade of destination object is when being secondary, obtain indicating described subsegment video to be the testing result of doubtful bad video subsegment.
9. device according to claim 8, is characterized in that, described second processing unit comprises: obtain subelement and process subelement;
Described acquisition subelement, for based on described testing result, obtains the second subsegment number of videos of the first subsegment number of videos into bad video subsegment and doubtful bad video field;
Described process subelement, for when the ratio of described first subsegment number of videos and subsegment video sum is greater than first threshold, obtain the testing result that the described video to be detected of instruction is bad video, and when the ratio of described second subsegment number of videos and described subsegment video sum is greater than Second Threshold, obtain the testing result that the described video to be detected of instruction is bad video, wherein said first threshold is less than Second Threshold.
10. device according to claim 7, is characterized in that, described detecting unit comprises: image detection sub-unit, text detection subelement and text hegemony subelement;
Described image detection sub-unit, for extracting the visual signature of the surveyed area of every two field picture in described subsegment video, extracted visual signature and the image object model set up in advance are carried out the matching analysis, to obtain the grade of bad object in described every two field picture and described bad object, wherein said image testing result comprises the grade of bad object in described every two field picture and described bad object;
Described text detection subelement, text filed for what determine in described subsegment video in every two field picture, text filedly text identification is carried out to determined, obtain the described text filed text comprised, and the text obtained is mated with the text library set up in advance, to obtain the grade of bad text in described every two field picture and described bad text, wherein said text detection result comprises the grade of bad text in described every two field picture and described bad text;
Described text hegemony subelement, for extracting the voice data in described subsegment video, obtain the phonetic feature sequence of described voice data, obtained phonetic feature sequence and the phonetic feature sequence of each keyword in the speech storehouse set up in advance are compared, obtains the distance between obtained phonetic feature sequence and the phonetic feature sequence of each keyword; When the value of the distance between obtained phonetic feature sequence and the phonetic feature sequence of any one keyword is less than distance threshold, determine that described subsegment video comprises bad speech; The value obtaining distance is less than the described keyword of distance threshold, and determines the grade of bad speech based on described keyword place grade, and described text hegemony result comprises the grade of described bad speech and described bad speech.
CN201510764366.0A 2015-11-10 2015-11-10 Method and apparatus for detecting video Pending CN105389558A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510764366.0A CN105389558A (en) 2015-11-10 2015-11-10 Method and apparatus for detecting video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510764366.0A CN105389558A (en) 2015-11-10 2015-11-10 Method and apparatus for detecting video

Publications (1)

Publication Number Publication Date
CN105389558A true CN105389558A (en) 2016-03-09

Family

ID=55421830

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510764366.0A Pending CN105389558A (en) 2015-11-10 2015-11-10 Method and apparatus for detecting video

Country Status (1)

Country Link
CN (1) CN105389558A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250837A (en) * 2016-07-27 2016-12-21 腾讯科技(深圳)有限公司 The recognition methods of a kind of video, device and system
WO2018023454A1 (en) * 2016-08-02 2018-02-08 步晓芳 Automatic pornography identification method, and recognition system
WO2018023453A1 (en) * 2016-08-02 2018-02-08 步晓芳 Patent information pushing method performed during automatic pornography identification, and recognition system
WO2018023452A1 (en) * 2016-08-02 2018-02-08 步晓芳 Method for collecting usage condition of adult shot identification technique, and recognition system
CN107784521A (en) * 2017-10-24 2018-03-09 中国移动通信集团公司 A kind of advertisement broadcast method, device and storage medium
CN108040262A (en) * 2018-01-25 2018-05-15 湖南机友科技有限公司 Live audio and video are reflected yellow method and device in real time
CN109168024A (en) * 2018-09-26 2019-01-08 平安科技(深圳)有限公司 A kind of recognition methods and equipment of target information
CN109831665A (en) * 2019-01-16 2019-05-31 深圳壹账通智能科技有限公司 A kind of video quality detecting method, system and terminal device
CN109889882A (en) * 2019-01-24 2019-06-14 北京亿幕信息技术有限公司 A kind of video clipping synthetic method and system
CN109934172A (en) * 2019-03-14 2019-06-25 中南大学 High-speed train pantograph exempts from GPS line failure vision-based detection localization method for the national games
CN110287960A (en) * 2019-07-02 2019-09-27 中国科学院信息工程研究所 The detection recognition method of curve text in natural scene image
CN111126373A (en) * 2019-12-23 2020-05-08 北京中科神探科技有限公司 Internet short video violation judgment device and method based on cross-modal identification technology
US10650240B2 (en) 2018-09-19 2020-05-12 International Business Machines Corporation Movie content rating

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101819638A (en) * 2010-04-12 2010-09-01 中国科学院计算技术研究所 Establishment method of pornographic detection model and pornographic detection method
CN102236796A (en) * 2011-07-13 2011-11-09 Tcl集团股份有限公司 Method and system for sorting defective contents of digital video
CN102509084A (en) * 2011-11-18 2012-06-20 中国科学院自动化研究所 Multi-examples-learning-based method for identifying horror video scene
CN102930553A (en) * 2011-08-10 2013-02-13 中国移动通信集团上海有限公司 Method and device for identifying objectionable video content
CN103400155A (en) * 2013-06-28 2013-11-20 西安交通大学 Pornographic video detection method based on semi-supervised learning of images
CN103473299A (en) * 2013-09-06 2013-12-25 北京锐安科技有限公司 Website bad likelihood obtaining method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101819638A (en) * 2010-04-12 2010-09-01 中国科学院计算技术研究所 Establishment method of pornographic detection model and pornographic detection method
CN102236796A (en) * 2011-07-13 2011-11-09 Tcl集团股份有限公司 Method and system for sorting defective contents of digital video
CN102930553A (en) * 2011-08-10 2013-02-13 中国移动通信集团上海有限公司 Method and device for identifying objectionable video content
CN102509084A (en) * 2011-11-18 2012-06-20 中国科学院自动化研究所 Multi-examples-learning-based method for identifying horror video scene
CN103400155A (en) * 2013-06-28 2013-11-20 西安交通大学 Pornographic video detection method based on semi-supervised learning of images
CN103473299A (en) * 2013-09-06 2013-12-25 北京锐安科技有限公司 Website bad likelihood obtaining method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
董守斌等: "《网络信息检索》", 30 April 2010 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250837A (en) * 2016-07-27 2016-12-21 腾讯科技(深圳)有限公司 The recognition methods of a kind of video, device and system
CN106250837B (en) * 2016-07-27 2019-06-18 腾讯科技(深圳)有限公司 A kind of recognition methods of video, device and system
WO2018023454A1 (en) * 2016-08-02 2018-02-08 步晓芳 Automatic pornography identification method, and recognition system
WO2018023453A1 (en) * 2016-08-02 2018-02-08 步晓芳 Patent information pushing method performed during automatic pornography identification, and recognition system
WO2018023452A1 (en) * 2016-08-02 2018-02-08 步晓芳 Method for collecting usage condition of adult shot identification technique, and recognition system
CN107784521A (en) * 2017-10-24 2018-03-09 中国移动通信集团公司 A kind of advertisement broadcast method, device and storage medium
CN108040262A (en) * 2018-01-25 2018-05-15 湖南机友科技有限公司 Live audio and video are reflected yellow method and device in real time
US10650240B2 (en) 2018-09-19 2020-05-12 International Business Machines Corporation Movie content rating
CN109168024A (en) * 2018-09-26 2019-01-08 平安科技(深圳)有限公司 A kind of recognition methods and equipment of target information
CN109168024B (en) * 2018-09-26 2022-05-27 平安科技(深圳)有限公司 Target information identification method and device
CN109831665A (en) * 2019-01-16 2019-05-31 深圳壹账通智能科技有限公司 A kind of video quality detecting method, system and terminal device
CN109831665B (en) * 2019-01-16 2022-07-08 深圳壹账通智能科技有限公司 Video quality inspection method, system and terminal equipment
CN109889882A (en) * 2019-01-24 2019-06-14 北京亿幕信息技术有限公司 A kind of video clipping synthetic method and system
CN109889882B (en) * 2019-01-24 2021-06-18 深圳亿幕信息科技有限公司 Video clip synthesis method and system
CN109934172B (en) * 2019-03-14 2021-10-15 中南大学 GPS-free full-operation line fault visual detection and positioning method for high-speed train pantograph
CN109934172A (en) * 2019-03-14 2019-06-25 中南大学 High-speed train pantograph exempts from GPS line failure vision-based detection localization method for the national games
CN110287960B (en) * 2019-07-02 2021-12-10 中国科学院信息工程研究所 Method for detecting and identifying curve characters in natural scene image
CN110287960A (en) * 2019-07-02 2019-09-27 中国科学院信息工程研究所 The detection recognition method of curve text in natural scene image
CN111126373A (en) * 2019-12-23 2020-05-08 北京中科神探科技有限公司 Internet short video violation judgment device and method based on cross-modal identification technology

Similar Documents

Publication Publication Date Title
CN105389558A (en) Method and apparatus for detecting video
Ahmed et al. Vision based hand gesture recognition using dynamic time warping for Indian sign language
US10679067B2 (en) Method for detecting violent incident in video based on hypergraph transition
CN111191695A (en) Website picture tampering detection method based on deep learning
US20120183212A1 (en) Identifying descriptor for person or object in an image
CN105335726B (en) Recognition of face confidence level acquisition methods and system
CN107909033A (en) Suspect&#39;s fast track method based on monitor video
CN112016605B (en) Target detection method based on corner alignment and boundary matching of bounding box
CN107977656A (en) A kind of pedestrian recognition methods and system again
CN103632159B (en) Method and system for training classifier and detecting text area in image
CN106530200A (en) Deep-learning-model-based steganography image detection method and system
CN104268586A (en) Multi-visual-angle action recognition method
CN110070090A (en) A kind of logistic label information detecting method and system based on handwriting identification
CN102254183B (en) Face detection method based on AdaBoost algorithm
CN105389562A (en) Secondary optimization method for monitoring video pedestrian re-identification result based on space-time constraint
Yang et al. Spatiotemporal trident networks: detection and localization of object removal tampering in video passive forensics
CN108537143B (en) A kind of face identification method and system based on key area aspect ratio pair
CN105279772A (en) Trackability distinguishing method of infrared sequence image
CN110796101A (en) Face recognition method and system of embedded platform
TW202030683A (en) Method and apparatus for extracting claim settlement information, and electronic device
CN102663777A (en) Target tracking method and system based on multi-view video
CN110879985B (en) Anti-noise data face recognition model training method
Chen et al. A video-based method with strong-robustness for vehicle detection and classification based on static appearance features and motion features
Liu et al. A crack detection system of subway tunnel based on image processing
Andiani et al. Face recognition for work attendance using multitask convolutional neural network (MTCNN) and pre-trained facenet

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160309