CN108734106A - Quick sudden and violent probably video frequency identifying method based on comparison - Google Patents

Quick sudden and violent probably video frequency identifying method based on comparison Download PDF

Info

Publication number
CN108734106A
CN108734106A CN201810366397.4A CN201810366397A CN108734106A CN 108734106 A CN108734106 A CN 108734106A CN 201810366397 A CN201810366397 A CN 201810366397A CN 108734106 A CN108734106 A CN 108734106A
Authority
CN
China
Prior art keywords
video
probably
layer
sudden
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810366397.4A
Other languages
Chinese (zh)
Other versions
CN108734106B (en
Inventor
李兵
胡卫明
原春锋
王博
赵永帅
刘琴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201810366397.4A priority Critical patent/CN108734106B/en
Publication of CN108734106A publication Critical patent/CN108734106A/en
Application granted granted Critical
Publication of CN108734106B publication Critical patent/CN108734106B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items

Abstract

The present invention relates to visual classification fields, propose a kind of quick sudden and violent probably video frequency identifying method based on comparison, it aims to solve the problem that in the sudden and violent probably video identification of view-based access control model feature since Feature Descriptor descriptive power is limited, the accuracy rate (precious) and recall rate (recall) relatively low problem of caused sudden and violent probably video identification.This method includes:To for carrying out cruelly, probably the video to be detected of identification carries out shot segmentation to choose the key frame of video to be detected;Video identification model is feared cruelly using what is built in advance, and Hash codes operation is carried out to each key frame of the video to be detected, obtains the Hash codes of each above-mentioned key frame;By the Hash codes of each above-mentioned key frame respectively compared with the sudden and violent probably Hash codes of the video frame of video that prestore, video frame similar with each above-mentioned key frame is determined;If the number of video frame similar with each above-mentioned key frame is more than given threshold, determine that the video to be detected is probably video cruelly.The present invention can be identified fast and accurately from a large amount of video fears video cruelly.

Description

Quick sudden and violent probably video frequency identifying method based on comparison
Technical field
The present invention relates to technical field of computer vision, more particularly to visual classification field, and in particular to one kind based on pair The quick of ratio fears video frequency identifying method cruelly.
Background technology
Sudden and violent probably video refers to the video of the contents such as, religion extreme, separation of nationalities sudden and violent probably containing advocation.With network technology Rapid development, mobile internet era is following, this makes more and more multi-medium datas be presented on people at the moment, cruelly Probably video is also able to largely propagate and spread.The detection for fearing video cruelly is mainly marked by manual examination and verification at present, this method Consume a large amount of financial resource and material resource.Therefore in face of the growing internet of data volume, a kind of novel technology automatic fitration is needed Terrorism video image content, and can deploy to ensure effective monitoring and control of illegal activities early warning in important public place.
The visual signature in sudden and violent probably video detection is applied to be broadly divided into two classes, static nature and behavioral characteristics at present.It is quiet State feature is used to describe the feature in video frame, including color, texture, structure etc..These features can effectively reflect background, ring The information such as border, leading role's appearance, MPEG-7 are a kind of typical static natures, there is the visions such as CLD, CSD, SC, EH description.Dynamically Feature is used to describe the feature of video interframe, including motion amplitude, direction, frequency etc., these features can effectively reflect The moving situation of leading role in video.Behavioral characteristics use Corner Detection Algorithm to carry out track and extract mostly.As HOG, HOF, MoSIFT etc..For detecting local feature, this description can only carry wherein MoSIFT algorithms in the place for having sufficient movement Take feature.But it is limited that features above describes sub- descriptive power, it is difficult to the content in comprehensive accurate description video image, especially sudden and violent It probably needs to be detected for specific target in video, so as to cause the detection work accuracy rate (precious) and recall rate (recall) relatively low.
Invention content
In order to solve the above problem in the prior art, in order to solve in two sections of videos, there are many places to copy segment, nothing Method accurately detect some it is compiled after video copy judge, and be accurately positioned copy video clip position the problem of, This application provides a kind of based on comparison quick probably video frequency identifying method cruelly, to solve the above problems.
This application provides the quick sudden and violent probably video frequency identifying methods based on comparison, and this method comprises the following steps:To being used for The video to be detected for carrying out sudden and violent probably identification carries out shot segmentation to choose the key frame of above-mentioned video to be detected;Utilize advance structure Probably video identification model cruelly, Hash codes operation is carried out to each key frame of above-mentioned video to be detected, obtains each above-mentioned key frame Hash codes;Wherein, above-mentioned probably video identification model is based on Hash network struction cruelly, and input is video frame, and it is defeated to export The Hash codes of the video frame entered;By the Hash codes of each above-mentioned key frame Hash codes ratio with the sudden and violent probably video frame of video that prestores respectively Compared with determining video frame similar with each above-mentioned key frame;The number of similar frame is counted, if similar with each above-mentioned key frame The number of video frame is more than given threshold, it is determined that above-mentioned video to be detected is to fear video cruelly.
In some instances, " to for carrying out cruelly, probably the video to be detected of identification carries out shot segmentation to choose above-mentioned wait for Detect the key frame of video ", including:The histogram for extracting every frame video frame of above-mentioned video to be detected, to adjacent video frames Histogram carries out comparison in difference, with the shot boundary of the above-mentioned video to be detected of determination;According to identified shot boundary, in selection State each camera lens of video to be detected start frame and/or end frame as key frame.
In some instances, " by the Hash codes of each above-mentioned key frame respectively with what is prestored the sudden and violent probably video frame of video Kazakhstan Uncommon code compares, and determines video frame similar with each above-mentioned key frame ", including:By the Hash codes of each above-mentioned key frame respectively with The Hash codes of the sudden and violent probably video frame of video in video library compare;Calculate the Kazakhstan of the Hash codes and above-mentioned video frame of above-mentioned key frame The Hamming distance of uncommon code;Key frame of the above-mentioned Hamming distance radius in range of set value and video frame are confirmed as similar frame.
It is in some instances, above-mentioned that probably video identification model, training method are cruelly:To preset training samples pictures Classification, is divided into positive sample data and negative sample data;Wherein, above-mentioned positive sample data are to fear cruelly and sudden and violent probably picture, above-mentioned negative sample Notebook data is to fear cruelly and non-sudden and violent probably picture;The size for adjusting above-mentioned training samples pictures, from the above-mentioned training sample after adjustment The region being sized is intercepted in this picture at random and carries out sample average processing;Video identification model is feared cruelly to place using initial Picture after reason is trained, and obtains fearing video identification model cruelly based on Hash network.
In some instances, the network structure of above-mentioned initial sudden and violent probably video identification model includes input layer, convolutional layer and complete Articulamentum, wherein first layer is input layer, and the second layer to layer 6 is convolutional layer, layer 7 to the 9th layer be full articulamentum.
In some instances, in the above-mentioned sudden and violent probably video identification model of training, input is through sample average in above-mentioned input layer Above-mentioned training samples pictures that treated.
In some instances, above-mentioned convolutional layer receives the output of preceding layer, sharp through this layer after process of convolution in this layer It is exported after function activation living;Above-mentioned full articulamentum receives the output of preceding layer, the activation through this layer after process of convolution in this layer It is exported after function activation.
In some instances, the activation of the initial sudden and violent probably second layer to the 8th layer of the network structure of video identification model Function is:
Wherein, ReLU (x) is activation primitive, and x is the output after this layer of convolution.
In some instances, the 9th layer of activation primitive of the above-mentioned initial sudden and violent probably network structure of video identification model is:
Wherein, δ (x) is to bi,jSeek the result that local derviation is later.
In some instances, the loss function of the above-mentioned sudden and violent probably video identification model of training is:
Wherein, yiIndicate sample to whether being similar, i.e. yi=1 two samples of expression are similar, otherwise dissimilar;It is the Euclidean distance between two sample two-value codes of sample centering;|||bi,1-1|||1、|||bi,2-1|||1It is sample The manhatton distance L of this two-value code and unit matrixrBe loss function m (m > 0) it is marginal threshold parameter, α is zoom factor, bi,1With the Hash codes b of sample 1i,2For the Hash codes of sample 2, N is training sample to sum, and k is the dimension of Hash codes.
Quick probably video frequency identifying method cruelly provided by the present application based on comparison, by carry out the video of sudden and violent probably detection into Row structured analysis, extracts key frame;Secondly, this section of video is determined using the video identification model of fearing cruelly based on Hash network The Hash codes of each key frame;Then, by the Kazakhstan of the Hash codes of the key frame of video to be detected and the sudden and violent probably key frame of video to prestore Uncommon code matching determines whether video to be detected is to fear video cruelly.Structured analysis, extraction are carried out to video to be detected in the present invention Go out key frame, realization reaches good balance between the accuracy and speed of Shot Detection;Hash codes using key frame with prestore Hash codes compare, can quickly judge video to be detected whether be include video;And the Hash codes occupied space to prestore Small, retrieval rate is fast, therefore, the present invention can quickly, accurately identify cruelly probably video.
Description of the drawings
Fig. 1 is that this application can be applied to exemplary system architecture figures therein;
Fig. 2 is the flow diagram of quick sudden and violent probably one embodiment of video frequency identifying method based on comparison of the application;
Fig. 3 is according to the net of Hash network model in quick sudden and violent probably video frequency identifying method embodiment of the application based on comparison Network structural schematic diagram;
Fig. 4, the application example flow diagram of the quick sudden and violent probably video frequency identifying method based on comparison of the application.
Specific implementation mode
The preferred embodiment of the present invention described with reference to the accompanying drawings.It will be apparent to a skilled person that this A little embodiments are used only for explaining the technical principle of the present invention, it is not intended that limit the scope of the invention.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 shows the exemplary of the quick sudden and violent probably video frequency identifying method embodiment based on comparison that can apply the application System architecture schematic diagram.
As shown in Figure 1, system architecture may include terminal device 101, network 102 and server 103.Network 102 to The medium of communication link is provided between terminal device 101 and server 103.Network 102 may include various connection types, example Such as wired, wireless communication link or fiber optic cables.
User can be interacted by network 102 with server 103 with using terminal equipment 101, to receive or send message etc.. Various telecommunication customer end applications can be installed on terminal device 101, for example, web browser applications, video tour, on video Pass class application, social platform software etc..
Terminal device 101 can be the various electronic equipments for having display screen and video tour or video being supported to upload, Including but not limited to smart mobile phone, tablet computer, pocket computer on knee and desktop computer etc..
Server 103 can be to provide the server of various services, such as the video uploaded to terminal device 101 wraps Include the video processing service device or application platform of identification.Video processing service device pair can be set with each terminal of its network connection The standby video data uploaded carries out the processing such as analyzing, and handling result (such as video fears recognition result cruelly) is fed back to terminal and is set Standby or third party uses.
It should be noted that the embodiment of the present application provided based on comparison it is quick cruelly probably video frequency identifying method generally by Server 103 executes, and correspondingly, the device of method shown in the application can be applied to be generally positioned in server 103.
It should be understood that the number of the terminal device, network and server in Fig. 1 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.
With continued reference to Fig. 2, a reality of the quick sudden and violent probably video frequency identifying method based on comparison according to the application is shown Apply the flow of example.The quick sudden and violent probably video frequency identifying method based on comparison, includes the following steps:
Step 201, to for carrying out cruelly, probably the video to be detected of identification carries out shot segmentation to choose above-mentioned to be detected regard The key frame of frequency.
In the present embodiment, it can apply the electronic equipment of the quick sudden and violent probably video frequency identifying method based on comparison (in such as Fig. 1 Server) or application platform, obtain the video to be detected of pending sudden and violent probably detection.Above-mentioned electronic equipment or application platform are to institute It obtains video to be detected and carries out shot segmentation respectively to extract the key frame of video to be detected.It to be detected is regarded as an example, above-mentioned Frequency can be obtained from the terminal device being connect with above-mentioned electronic equipment or application platform, for example, using with above-mentioned server or After user's uploaded videos of the terminal device of application platform network connection, above-mentioned server or application platform obtain the video conduct Video to be detected.
Specifically, above-mentioned " described to be checked to choose to the video to be detected progress shot segmentation for carrying out probably identification cruelly Survey the key frame of video ", including:The histogram for extracting every frame video frame of video to be detected, to the histogram of adjacent video frames Comparison in difference is carried out, with the shot boundary of the above-mentioned video to be detected of determination;According to identified shot boundary, choose above-mentioned to be checked The start frame and/or end frame of survey each camera lens of video are as key frame.Histogram of the said extracted per frame video frame, can be ash Spend histogram or color histogram.It, can be by each camera lens after i.e. by Video segmentation to be detected at a series of camera lens The key frame of first frame or last frame as camera lens;It can also be using first frame and last frame as key frame.
Step 202, video identification model is feared cruelly using what is built in advance, each key frame of above-mentioned video to be detected is carried out Hash codes operation obtains the Hash codes of each key frame.
In the present embodiment, based on multiple key frames of the video to be detected of selected taking-up in step 201, above-mentioned electronic equipment Or application platform carries out operation using the Hash network model built in advance, determines the Hash codes of each key frame.Here, above-mentioned sudden and violent Probably video identification model can be depth convolutional neural networks model, such as can be Siamese network models, utilize Siamese network models add the Hash operation of designed Hash loss completion key frame of video to be detected.It is above-mentioned to fear video cruelly Identification model is based on Hash network struction, and input is video frame, exports the Hash codes of the video frame to be inputted.
Above-mentioned probably video identification model determines that key frame Hash codes can be that the frame picture that will be inputted judges cruelly, profit It is run with the optimization of depth convolutional neural networks, completes inputted key frame (picture) Hash operation.It is above-mentioned that probably video is known cruelly Other model can using the feature of key frame carry out operation, the feature of key frame can be include color, texture, the reflections such as structure The static nature of the information such as background, environment, leading role's appearance;And including motion amplitude, direction, leading role in the reflecting videos such as frequency Moving situation behavioral characteristics.Using the features described above of key frame, the Hash codes of key frame are determined.
Step 203, by the Hash codes of each above-mentioned key frame respectively compared with the sudden and violent probably Hash codes of the video frame of video that prestore, Determine video frame similar with each above-mentioned key frame.
In the present embodiment, it is based in step 202 utilizing the sudden and violent probably obtained video to be detected of video identification model calculation Key frame Hash codes, above-mentioned electronic equipment or application platform to be detected regard compared with the Hash codes to prestore with determination is above-mentioned Whether the key frame of frequency is similar to the video frame of video is feared cruelly.The above-mentioned Hash codes to prestore can be the sudden and violent probably video frame of video Hash codes.
Here, the above-mentioned Hash codes to prestore obtain in the following way:It is extracted from video library first and fears video cruelly, so Afterwards, video is feared cruelly all offline or in line extraction key video sequence frame to what is extracted;Finally, the key video sequence that will be extracted Frame is input to fears operation in video identification model cruelly based on Hash network, obtains the Hash codes for fearing video cruelly, and will be acquired Cruelly probably video Hash codes storage.
Above-mentioned Hash codes relatively can be the Hamming distance of the Hash codes and the Hash codes that prestore that compare key frame, and according to the Chinese Prescribed distance determines whether key frame is similar to the video frame of video is feared cruelly.
In some optional realization methods of the present embodiment, it is above-mentioned " by the Hash codes of each above-mentioned key frame respectively in advance The Hash codes for the sudden and violent probably video frame of video deposited compare, and determine video frame similar with each above-mentioned key frame ", including:It will be each The Hash codes of above-mentioned key frame are respectively compared with the probably Hash codes of the video frame of video sudden and violent in video library;Calculate above-mentioned key frame Hash codes and above-mentioned video frame Hash codes Hamming distance;By key of the above-mentioned Hamming distance radius in range of set value Frame and video frame are confirmed as similar frame.Specifically, two frame pictures of the Hamming distance radius within 2 can be confirmed as similar Frame.
Step 204, similar frame number is counted, if the number of video frame similar with each above-mentioned key frame is more than setting threshold Value, it is determined that above-mentioned video to be detected is to fear video cruelly.
In the present embodiment, it in above-mentioned steps 203, determines and the sudden and violent probably video in above-mentioned sudden and violent probably video database The similar key frame of video frame counts key frame similar with the video frame in above-mentioned sudden and violent probably video in above-mentioned video to be detected Number can then determine that the video to be detected is to fear video cruelly if the number is more than the threshold value of setting.Specifically, if waited for Detection video has 3 frames and the above key frame of 3 frames and probably probably the video frame of video is similar cruelly in video library cruelly, then confirms that this is to be detected Video is to fear video cruelly.
It is above-mentioned that video identification model is feared based on Hash network cruelly in some optional realization methods of the present embodiment, Its training method is:Classify to preset training samples pictures, is divided into positive sample data and negative sample data, wherein above-mentioned Positive sample data are feared to fear picture with sudden and violent to be sudden and violent, and above-mentioned negative sample data are to fear cruelly and non-sudden and violent probably picture;Adjust above-mentioned training sample The size of this picture, from the above-mentioned training after adjustment with intercepting the region being sized in samples pictures at random and carry out sample standard deviation Value processing;Using initially probably to treated, picture is trained video identification model cruelly, obtain cruelly fearing based on Hash network Video identification model.Specifically, training can be divided into two groups with data:Positive sample data and negative sample data;Wherein, positive sample Notebook data can be probably to fear picture with sudden and violent cruelly, and the label of positive sample data is set as 1, and negative sample data can be feared cruelly to fear with non-to be sudden and violent The label of picture, negative sample data is set as 0;So that the Hash codes feared between video cruelly are similar as possible, non-probably video is feared with sudden and violent cruelly The Hash codes of video are mutually remote as possible.
Adjust above-mentioned training samples pictures, the size of above-mentioned training samples pictures be adjusted to 256*256, then with Machine intercepts the region of 227*227 sizes, and subtracts all sample averages as treated samples pictures, can be directly inputted to Initial Hash network model is trained.Above-mentioned sample average is the average value of the samples pictures all pixels point;Subtract sample After this mean value, then it is trained and tests to improve trained speed and measuring accuracy.
By a pair of of picture of above-mentioned positive sample data (the first sudden and violent probably picture and the second sudden and violent probably picture) or negative sample data A pair of of picture (frame is that probably picture, a frame are non-sudden and violent probably picture cruelly) is input to initial Hash network model and is trained.
In some optional realization methods of the present embodiment, the network structure packet of above-mentioned initial sudden and violent probably video identification model Input layer, convolutional layer and full articulamentum are included, the schematic network structure of Hash network model is illustrated in figure 3.Wherein, first layer For input layer, the second layer to layer 6 is convolutional layer, layer 7 to the 9th layer be full articulamentum.Wherein, defeated in above-mentioned input layer The training samples pictures that enter that treated, above-mentioned training samples pictures are the picture of two frame RGB triple channels.The above-mentioned second layer is extremely The convolutional layer of layer 6 uses conv1-conv5 to indicate in figure 3;The full articulamentum of above-mentioned layer 7 to the 9th layer, in Fig. 3 It is middle to be indicated using fc1-fc3;Loss function (loss) in above-mentioned full articulamentum has:" power of having any different And the two major features of " nearly binary-coding (Binary-like) " (Discriminative) ".
Above-mentioned convolutional layer receives the output of preceding layer, this layer after process of convolution after the activation of the activation primitive of this layer it is defeated Go out;Above-mentioned full articulamentum receives the output of preceding layer, is exported after the activation of the activation primitive of this layer after process of convolution in this layer. Specifically:
The above-mentioned second layer is convolutional layer, shares 64 convolution kernels, and each convolution kernel size is 11 × 11, and convolution step-length is 4, Padding=0, connection active coating, down-sampling layer and normalization layer after the characteristic pattern of output.Active coating activation primitive uses ReLU Function.Sample level sample mode is maximum value sampling, and sampling core is 3 × 3, step-length 2.Normalize the LRN normalization that layer uses Method, core size are set as 0.00001, beta for 5, alpha and are set as 0.75.Wherein, alpha is zoom factor, and beta is to refer to It is several.The second layer obtains the output of first layer, and output is C after process of convolution1, C1It is input to down-sampling layer and obtains P1, P1It is input to Active coating obtains A1, A1It is input to normalization layer and obtains L1, finally export L1To third layer.
Third layer is convolutional layer, shares 256 convolution kernels, and each convolution kernel size is 5 × 5, and convolution step-length is 1, Padding=2, connection active coating, down-sampling layer and normalization layer after the characteristic pattern of output.Active coating activation primitive uses ReLU Function.Sample level sample mode is maximum value sampling, and sampling core is 3 × 3, step-length 2.Normalize the LRN normalization that layer uses Method, core size are set as 0.00001, beta for 5, alpha and are set as 0.75.Third layer obtains the output of the second layer, at convolution Output is C after reason2, C2It is input to down-sampling layer and obtains P2, P2It is input to active coating and obtains A2, A2Normalization layer is input to obtain L2, finally export L2To the 4th layer.
4th layer is convolutional layer, shares 256 convolution kernels, and each convolution kernel size is 3 × 3, and convolution step-length is 1, Padding=1 connects active coating after the characteristic pattern of output.Active coating activation primitive uses ReLU functions.4th layer of acquisition third The output of layer, output is C after process of convolution3, C3It is input to active coating and obtains A3, finally export A3To layer 5.
Layer 5 is convolutional layer, shares 256 convolution kernels, and each convolution kernel size is 3 × 3, and convolution step-length is 1, Padding=1 connects active coating after the characteristic pattern of output.Active coating activation primitive uses ReLU functions.Layer 5 obtains the 4th The output of layer, output is C after process of convolution4, C4It is input to active coating and obtains A4, finally export A4To layer 6.
Layer 6 is convolutional layer, shares 256 convolution kernels, and each convolution kernel size is 3 × 3, and convolution step-length is 1, Padding=1, connection active coating, down-sampling layer after the characteristic pattern of output.Active coating activation primitive uses ReLU functions.Sampling Layer sample mode is maximum value sampling, and sampling core is 3 × 3, step-length 2.Layer 6 obtains the output of layer 5, after process of convolution Output is C5, C5It is input to down-sampling layer and obtains P5, P5It is input to active coating and obtains A5, finally export A5To layer 7.
Layer 7 is full articulamentum, and it is 1 × 1 to have 4096 convolution kernels, each convolution kernel size, step-length 1, the spy of output Active coating is connected after sign figure.Active coating activation primitive uses ReLU functions.Layer 7 obtains the output of layer 6, after process of convolution Output is C6, C6It is input to active coating and obtains A6, finally export A6To the 8th layer.
8th layer is full articulamentum, and it is 1 × 1 to have 4096 convolution kernels, each convolution kernel size, step-length 1, the spy of output Active coating is connected after sign figure.Active coating activation primitive uses ReLU functions.The output of 8th layer of acquisition layer 7, after process of convolution Output is C7, C7It is input to active coating and obtains A7, finally export A7To last one layer.
9th layer is full articulamentum, and convolution kernel number Hash code length as needed determines that each convolution kernel size is 1 × 1, Step-length is 1, and Hash loss layer is connected after the characteristic pattern of output.Hash loss layer uses hash function.9th layer obtains the 8th layer Output, output is C after process of convolution8, C8It is input to the Hash two-value code (b of Hash loss layer output sample pairi,1,bi,2)。
All include activation primitive in above layers, wherein the activation primitive of the second layer to the 8th layer is:
Wherein, ReLU (x) is activation primitive, and x is the output after this layer of convolution.
It is above-mentioned that initially probably the 9th layer of activation primitive of the network structure of video identification model is cruelly:
Wherein, δ (x) is to bi,jSeek the result that local derviation is later.
Training is above-mentioned, and probably the loss function of video identification model is cruelly:
Wherein, whether yi indicates sample to being similar, i.e. yi=1 two samples of expression are similar, otherwise dissimilar;It is the Euclidean distance between two sample two-value codes of sample centering;|||bi,1-1|||1、|||bi,2-1|||1It is sample The manhatton distance L of this two-value code and unit matrixrBe loss function m (m > 0) it is marginal threshold parameter, α is zoom factor, bi,1And bi,2For the Hash codes of sample 1 and sample 2, N is training sample to sum, and k is the dimension of Hash codes.
As an example, with reference to figure 4, Fig. 4 shows the quick sudden and violent probably video identification schematic diagram based on comparison.As shown in Figure 4, On the one hand, the key frame for extracting sudden and violent probably video from video database in advance, using probably the generation of video identification model is each cruelly The Hash codes of key frame.On the other hand, the key frame for extracting video to be detected generates each key frame using Hash network model Hash codes.Then the Hamming distance of the Hash codes of key frame of video more to be detected and the sudden and violent probably Hash codes of key frame of video. Two frame pictures of the Hamming distance radius within 2 are confirmed as similar frame.Finally, if video to be detected has 3 frames and 3 frames or more Key frame is similar with key frame of video is feared in video library cruelly, then it is assumed that the video is to fear video cruelly.
The method that the above embodiments of the present application are provided with sudden and violent by the Hash codes of key frame of video to be detected by fearing video The Hash codes of key frame match, and confirm the similar frame of key frame of video to be detected, according in video to be detected with video database The number of the similar key frame of middle key frame confirms whether video to be detected is probably video cruelly.It is closed using shot segmentation extraction video Key frame realizes and reaches good balance between the accuracy and speed of Shot Detection;Using key frame Hash codes and prestore Hash codes compare, can quickly judge video to be detected whether be include video;And the Hash codes to prestore occupy little space, Retrieval rate is fast;The Hash codes of key frame can be accurately and rapidly obtained using Hash network model;Therefore, using the present invention The method of offer can quickly, accurately identify cruelly probably video.
So far, it has been combined preferred embodiment shown in the drawings and describes technical scheme of the present invention, still, this field Technical staff is it is easily understood that protection scope of the present invention is expressly not limited to these specific implementation modes.Without departing from this Under the premise of the principle of invention, those skilled in the art can make the relevant technologies feature equivalent change or replacement, these Technical solution after change or replacement is fallen within protection scope of the present invention.

Claims (10)

1. a kind of quick sudden and violent probably video frequency identifying method based on comparison, which is characterized in that the method includes:
To for carrying out probably identifying that video to be detected carries out shot segmentation to choose the key frame of the video to be detected cruelly;
Video identification model is feared cruelly using what is built in advance, and Hash codes operation is carried out to each key frame of the video to be detected, Obtain the Hash codes of each key frame;Described probably video identification model is based on Hash network struction cruelly, and input is video frame, Output is the Hash codes of the video frame inputted;
The Hash codes of each key frame are determined and each institute compared with the sudden and violent probably Hash codes of the video frame of video that prestore respectively State the similar video frame of key frame;
The number of similar frame is counted, if the number of video frame similar with each key frame is more than given threshold, it is determined that The video to be detected is to fear video cruelly.
2. it is according to claim 1 based on comparison it is quick cruelly probably video frequency identifying method, which is characterized in that " to be used for into The video to be detected of the sudden and violent probably identification of row carries out shot segmentation to choose the key frame of the video to be detected ", including:
The histogram for extracting every frame video frame of the video to be detected carries out comparison in difference to the histogram of adjacent video frames, With the shot boundary of the determination video to be detected;
According to identified shot boundary, chooses the start frame of each camera lens of video to be detected and/or end frame is used as and closes Key frame.
3. the quick sudden and violent probably video frequency identifying method according to claim 1 based on comparison, which is characterized in that " will be each described The Hash codes of key frame compared with the probably Hash codes of the video frame of video sudden and violent with what is prestored, are determined and each key frame phase respectively As video frame ", including:
Compared with the Hash codes of each key frame respectively probably Hash codes of the video frame of video sudden and violent in video library;
Calculate the Hamming distance of the Hash codes of the key frame and the Hash codes of the video frame;
Key frame of the Hamming distance radius in range of set value and video frame are confirmed as similar frame.
4. the quick sudden and violent probably video frequency identifying method according to claim 3 based on comparison, which is characterized in that described probably to regard cruelly Frequency identification model, training method:
Classify to preset training samples pictures, is divided into positive sample data and negative sample data;Wherein, the positive sample data To fear to fear picture with sudden and violent cruelly, the negative sample data are to fear cruelly and non-sudden and violent probably picture;
The size for adjusting the training samples pictures, interception setting is big at random from the training samples pictures after adjustment Small region simultaneously carries out sample average processing;
Using initially probably to treated, picture is trained video identification model cruelly, obtain fearing video cruelly based on Hash network Identification model.
5. the quick sudden and violent probably video frequency identifying method according to claim 4 based on comparison, which is characterized in that described initial sudden and violent Probably the network structure of video identification model includes input layer, convolutional layer and full articulamentum, wherein and first layer is input layer, second Layer to layer 6 be convolutional layer, layer 7 to the 9th layer be full articulamentum.
6. the quick sudden and violent probably video frequency identifying method according to claim 5 based on comparison, which is characterized in that described in training Cruelly probably in video identification model, input is through sample average treated training samples pictures in the input layer.
7. the quick sudden and violent probably video frequency identifying method according to claim 5 based on comparison, which is characterized in that the convolutional layer Receive preceding layer output, this layer after process of convolution through the activation primitive of this layer activation after export;The full articulamentum connects Receive preceding layer output, this layer after process of convolution through the activation primitive of this layer activation after export.
8. the quick sudden and violent probably video frequency identifying method according to claim 7 based on comparison, which is characterized in that described initial sudden and violent Probably the activation primitive of the second layer of the network structure of video identification model to the 8th layer is:
Wherein, ReLU (x) is activation primitive, and x is the output after this layer of convolution.
9. the quick sudden and violent probably video frequency identifying method according to claim 7 based on comparison, which is characterized in that described initial sudden and violent Probably the 9th layer of activation primitive of the network structure of video identification model is:
Wherein, δ (x) is to bi,jSeek the result that local derviation is later.
10. according to any quick sudden and violent probably video frequency identifying method based on comparison of claim 4 to 9, which is characterized in that instruction Practicing the sudden and violent loss function for fearing video identification model is:
s.t. bi,j∈{-1,+1}k,i∈{1,...,N},j∈{1,2}
Wherein, yiIndicate sample to whether being similar, i.e. yi=1 two samples of expression are similar, otherwise dissimilar;It is the Euclidean distance between two sample two-value codes of sample centering;|||bi,1-1|||1、|||bi,2-1|||1It is sample The manhatton distance L of this two-value code and unit matrixrBe loss function m (m > 0) it is marginal threshold parameter, α is zoom factor, bi,1And bi,2For the Hash codes of sample 1 and sample 2, N is the sum of training sample pair, and k is the dimension of Hash codes.
CN201810366397.4A 2018-04-23 2018-04-23 Rapid riot and terrorist video identification method based on comparison Active CN108734106B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810366397.4A CN108734106B (en) 2018-04-23 2018-04-23 Rapid riot and terrorist video identification method based on comparison

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810366397.4A CN108734106B (en) 2018-04-23 2018-04-23 Rapid riot and terrorist video identification method based on comparison

Publications (2)

Publication Number Publication Date
CN108734106A true CN108734106A (en) 2018-11-02
CN108734106B CN108734106B (en) 2021-01-05

Family

ID=63939718

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810366397.4A Active CN108734106B (en) 2018-04-23 2018-04-23 Rapid riot and terrorist video identification method based on comparison

Country Status (1)

Country Link
CN (1) CN108734106B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109785214A (en) * 2019-03-01 2019-05-21 宝能汽车有限公司 Safety alarming method and device based on car networking
CN109918537A (en) * 2019-01-18 2019-06-21 杭州电子科技大学 A kind of method for quickly retrieving of the ship monitor video content based on HBase
CN110796182A (en) * 2019-10-15 2020-02-14 西安网算数据科技有限公司 Bill classification method and system for small amount of samples
CN111078941A (en) * 2019-12-18 2020-04-28 福州大学 Similar video retrieval system based on frame correlation coefficient and perceptual hash
CN112395457A (en) * 2020-12-11 2021-02-23 中国搜索信息科技股份有限公司 Video to-be-retrieved positioning method applied to video copyright protection
CN112861976A (en) * 2021-02-11 2021-05-28 温州大学 Sensitive image identification method based on twin graph convolution hash network
CN114724074A (en) * 2022-06-01 2022-07-08 共道网络科技有限公司 Method and device for detecting risk video

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103744973A (en) * 2014-01-11 2014-04-23 西安电子科技大学 Video copy detection method based on multi-feature Hash
CN105718861A (en) * 2016-01-15 2016-06-29 北京市博汇科技股份有限公司 Method and device for identifying video streaming data category
WO2018017566A1 (en) * 2016-07-18 2018-01-25 The Regents Of The University Of Michigan Hash-chain based sender identification scheme

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103744973A (en) * 2014-01-11 2014-04-23 西安电子科技大学 Video copy detection method based on multi-feature Hash
CN105718861A (en) * 2016-01-15 2016-06-29 北京市博汇科技股份有限公司 Method and device for identifying video streaming data category
WO2018017566A1 (en) * 2016-07-18 2018-01-25 The Regents Of The University Of Michigan Hash-chain based sender identification scheme

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
JING ZHANG等: ""Image Copy Detection Based on Convolutional Neural Networks"", 《CCPR 2016: PATTERN RECOGNITION》 *
LI LI等: "《2010 20th International Conference on Pattern Recognition》", 7 October 2010 *
XIANGLIN ZENG等: "《2008 IEEE International Conference on Multimedia and Expo》", 26 August 2008 *
YANG DU等: "《2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)》", 9 November 2017 *
张惠凡等: ""基于卷积神经网络的鸟类视频图像检索研究"", 《科研信息化技术与应用》 *
彭天强等: ""基于深度卷积神经网络和二进制哈希学习的图像检索方法"", 《电子与信息学报》 *
王媛媛等: ""有害音视频一致性检测方法的研究与实现"", 《中国人民公安大学学报(自然科学版)》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109918537A (en) * 2019-01-18 2019-06-21 杭州电子科技大学 A kind of method for quickly retrieving of the ship monitor video content based on HBase
CN109918537B (en) * 2019-01-18 2021-05-11 杭州电子科技大学 HBase-based rapid retrieval method for ship monitoring video content
CN109785214A (en) * 2019-03-01 2019-05-21 宝能汽车有限公司 Safety alarming method and device based on car networking
CN110796182A (en) * 2019-10-15 2020-02-14 西安网算数据科技有限公司 Bill classification method and system for small amount of samples
CN111078941A (en) * 2019-12-18 2020-04-28 福州大学 Similar video retrieval system based on frame correlation coefficient and perceptual hash
CN112395457A (en) * 2020-12-11 2021-02-23 中国搜索信息科技股份有限公司 Video to-be-retrieved positioning method applied to video copyright protection
CN112395457B (en) * 2020-12-11 2021-06-22 中国搜索信息科技股份有限公司 Video to-be-retrieved positioning method applied to video copyright protection
CN112861976A (en) * 2021-02-11 2021-05-28 温州大学 Sensitive image identification method based on twin graph convolution hash network
CN112861976B (en) * 2021-02-11 2024-01-12 温州大学 Sensitive image identification method based on twin graph convolution hash network
CN114724074A (en) * 2022-06-01 2022-07-08 共道网络科技有限公司 Method and device for detecting risk video

Also Published As

Publication number Publication date
CN108734106B (en) 2021-01-05

Similar Documents

Publication Publication Date Title
CN108734106A (en) Quick sudden and violent probably video frequency identifying method based on comparison
TWI773189B (en) Method of detecting object based on artificial intelligence, device, equipment and computer-readable storage medium
US20230116801A1 (en) Image authenticity detection method and device, computer device, and storage medium
CN111886842B (en) Remote user authentication using threshold-based matching
CN110276366A (en) Carry out test object using Weakly supervised model
CN108399665A (en) Method for safety monitoring, device based on recognition of face and storage medium
US20180034852A1 (en) Anti-spoofing system and methods useful in conjunction therewith
CN107169458B (en) Data processing method, device and storage medium
CN107093066A (en) Service implementation method and device
TW202026948A (en) Methods and devices for biological testing and storage medium thereof
CN108388878A (en) The method and apparatus of face for identification
CN108416902A (en) Real-time object identification method based on difference identification and device
WO2021175071A1 (en) Image processing method and apparatus, storage medium, and electronic device
CN109308490A (en) Method and apparatus for generating information
WO2022247539A1 (en) Living body detection method, estimation network processing method and apparatus, computer device, and computer readable instruction product
CN110442742A (en) Retrieve method and device, processor, electronic equipment and the storage medium of image
WO2022188315A1 (en) Video detection method and apparatus, electronic device, and storage medium
WO2019056503A1 (en) Store monitoring evaluation method, device and storage medium
CN111898412A (en) Face recognition method, face recognition device, electronic equipment and medium
CN108875582A (en) Auth method, device, equipment, storage medium and program
KR20150128510A (en) Apparatus and method for liveness test, and apparatus and method for image processing
WO2023165616A1 (en) Method and system for detecting concealed backdoor of image model, storage medium, and terminal
CN103609098B (en) Method and apparatus for being registered in telepresence system
CN111241873A (en) Image reproduction detection method, training method of model thereof, payment method and payment device
CN111767840A (en) Method, apparatus, electronic device and computer-readable storage medium for verifying image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant