CN108734106A - Quick sudden and violent probably video frequency identifying method based on comparison - Google Patents
Quick sudden and violent probably video frequency identifying method based on comparison Download PDFInfo
- Publication number
- CN108734106A CN108734106A CN201810366397.4A CN201810366397A CN108734106A CN 108734106 A CN108734106 A CN 108734106A CN 201810366397 A CN201810366397 A CN 201810366397A CN 108734106 A CN108734106 A CN 108734106A
- Authority
- CN
- China
- Prior art keywords
- video
- probably
- layer
- sudden
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 230000011218 segmentation Effects 0.000 claims abstract description 10
- 230000004913 activation Effects 0.000 claims description 28
- 238000012549 training Methods 0.000 claims description 26
- 238000012545 processing Methods 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 3
- 206010016275 Fear Diseases 0.000 abstract description 5
- 230000000007 visual effect Effects 0.000 abstract description 3
- 239000011248 coating agent Substances 0.000 description 21
- 238000000576 coating method Methods 0.000 description 21
- 238000005070 sampling Methods 0.000 description 13
- 238000001514 detection method Methods 0.000 description 9
- 238000010606 normalization Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 230000003542 behavioural effect Effects 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000033001 locomotion Effects 0.000 description 3
- 230000003068 static effect Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000004438 eyesight Effects 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 210000003127 knee Anatomy 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012821 model calculation Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
Abstract
The present invention relates to visual classification fields, propose a kind of quick sudden and violent probably video frequency identifying method based on comparison, it aims to solve the problem that in the sudden and violent probably video identification of view-based access control model feature since Feature Descriptor descriptive power is limited, the accuracy rate (precious) and recall rate (recall) relatively low problem of caused sudden and violent probably video identification.This method includes:To for carrying out cruelly, probably the video to be detected of identification carries out shot segmentation to choose the key frame of video to be detected;Video identification model is feared cruelly using what is built in advance, and Hash codes operation is carried out to each key frame of the video to be detected, obtains the Hash codes of each above-mentioned key frame;By the Hash codes of each above-mentioned key frame respectively compared with the sudden and violent probably Hash codes of the video frame of video that prestore, video frame similar with each above-mentioned key frame is determined;If the number of video frame similar with each above-mentioned key frame is more than given threshold, determine that the video to be detected is probably video cruelly.The present invention can be identified fast and accurately from a large amount of video fears video cruelly.
Description
Technical field
The present invention relates to technical field of computer vision, more particularly to visual classification field, and in particular to one kind based on pair
The quick of ratio fears video frequency identifying method cruelly.
Background technology
Sudden and violent probably video refers to the video of the contents such as, religion extreme, separation of nationalities sudden and violent probably containing advocation.With network technology
Rapid development, mobile internet era is following, this makes more and more multi-medium datas be presented on people at the moment, cruelly
Probably video is also able to largely propagate and spread.The detection for fearing video cruelly is mainly marked by manual examination and verification at present, this method
Consume a large amount of financial resource and material resource.Therefore in face of the growing internet of data volume, a kind of novel technology automatic fitration is needed
Terrorism video image content, and can deploy to ensure effective monitoring and control of illegal activities early warning in important public place.
The visual signature in sudden and violent probably video detection is applied to be broadly divided into two classes, static nature and behavioral characteristics at present.It is quiet
State feature is used to describe the feature in video frame, including color, texture, structure etc..These features can effectively reflect background, ring
The information such as border, leading role's appearance, MPEG-7 are a kind of typical static natures, there is the visions such as CLD, CSD, SC, EH description.Dynamically
Feature is used to describe the feature of video interframe, including motion amplitude, direction, frequency etc., these features can effectively reflect
The moving situation of leading role in video.Behavioral characteristics use Corner Detection Algorithm to carry out track and extract mostly.As HOG, HOF,
MoSIFT etc..For detecting local feature, this description can only carry wherein MoSIFT algorithms in the place for having sufficient movement
Take feature.But it is limited that features above describes sub- descriptive power, it is difficult to the content in comprehensive accurate description video image, especially sudden and violent
It probably needs to be detected for specific target in video, so as to cause the detection work accuracy rate (precious) and recall rate
(recall) relatively low.
Invention content
In order to solve the above problem in the prior art, in order to solve in two sections of videos, there are many places to copy segment, nothing
Method accurately detect some it is compiled after video copy judge, and be accurately positioned copy video clip position the problem of,
This application provides a kind of based on comparison quick probably video frequency identifying method cruelly, to solve the above problems.
This application provides the quick sudden and violent probably video frequency identifying methods based on comparison, and this method comprises the following steps:To being used for
The video to be detected for carrying out sudden and violent probably identification carries out shot segmentation to choose the key frame of above-mentioned video to be detected;Utilize advance structure
Probably video identification model cruelly, Hash codes operation is carried out to each key frame of above-mentioned video to be detected, obtains each above-mentioned key frame
Hash codes;Wherein, above-mentioned probably video identification model is based on Hash network struction cruelly, and input is video frame, and it is defeated to export
The Hash codes of the video frame entered;By the Hash codes of each above-mentioned key frame Hash codes ratio with the sudden and violent probably video frame of video that prestores respectively
Compared with determining video frame similar with each above-mentioned key frame;The number of similar frame is counted, if similar with each above-mentioned key frame
The number of video frame is more than given threshold, it is determined that above-mentioned video to be detected is to fear video cruelly.
In some instances, " to for carrying out cruelly, probably the video to be detected of identification carries out shot segmentation to choose above-mentioned wait for
Detect the key frame of video ", including:The histogram for extracting every frame video frame of above-mentioned video to be detected, to adjacent video frames
Histogram carries out comparison in difference, with the shot boundary of the above-mentioned video to be detected of determination;According to identified shot boundary, in selection
State each camera lens of video to be detected start frame and/or end frame as key frame.
In some instances, " by the Hash codes of each above-mentioned key frame respectively with what is prestored the sudden and violent probably video frame of video Kazakhstan
Uncommon code compares, and determines video frame similar with each above-mentioned key frame ", including:By the Hash codes of each above-mentioned key frame respectively with
The Hash codes of the sudden and violent probably video frame of video in video library compare;Calculate the Kazakhstan of the Hash codes and above-mentioned video frame of above-mentioned key frame
The Hamming distance of uncommon code;Key frame of the above-mentioned Hamming distance radius in range of set value and video frame are confirmed as similar frame.
It is in some instances, above-mentioned that probably video identification model, training method are cruelly:To preset training samples pictures
Classification, is divided into positive sample data and negative sample data;Wherein, above-mentioned positive sample data are to fear cruelly and sudden and violent probably picture, above-mentioned negative sample
Notebook data is to fear cruelly and non-sudden and violent probably picture;The size for adjusting above-mentioned training samples pictures, from the above-mentioned training sample after adjustment
The region being sized is intercepted in this picture at random and carries out sample average processing;Video identification model is feared cruelly to place using initial
Picture after reason is trained, and obtains fearing video identification model cruelly based on Hash network.
In some instances, the network structure of above-mentioned initial sudden and violent probably video identification model includes input layer, convolutional layer and complete
Articulamentum, wherein first layer is input layer, and the second layer to layer 6 is convolutional layer, layer 7 to the 9th layer be full articulamentum.
In some instances, in the above-mentioned sudden and violent probably video identification model of training, input is through sample average in above-mentioned input layer
Above-mentioned training samples pictures that treated.
In some instances, above-mentioned convolutional layer receives the output of preceding layer, sharp through this layer after process of convolution in this layer
It is exported after function activation living;Above-mentioned full articulamentum receives the output of preceding layer, the activation through this layer after process of convolution in this layer
It is exported after function activation.
In some instances, the activation of the initial sudden and violent probably second layer to the 8th layer of the network structure of video identification model
Function is:
Wherein, ReLU (x) is activation primitive, and x is the output after this layer of convolution.
In some instances, the 9th layer of activation primitive of the above-mentioned initial sudden and violent probably network structure of video identification model is:
Wherein, δ (x) is to bi,jSeek the result that local derviation is later.
In some instances, the loss function of the above-mentioned sudden and violent probably video identification model of training is:
Wherein, yiIndicate sample to whether being similar, i.e. yi=1 two samples of expression are similar, otherwise dissimilar;It is the Euclidean distance between two sample two-value codes of sample centering;|||bi,1-1|||1、|||bi,2-1|||1It is sample
The manhatton distance L of this two-value code and unit matrixrBe loss function m (m > 0) it is marginal threshold parameter, α is zoom factor,
bi,1With the Hash codes b of sample 1i,2For the Hash codes of sample 2, N is training sample to sum, and k is the dimension of Hash codes.
Quick probably video frequency identifying method cruelly provided by the present application based on comparison, by carry out the video of sudden and violent probably detection into
Row structured analysis, extracts key frame;Secondly, this section of video is determined using the video identification model of fearing cruelly based on Hash network
The Hash codes of each key frame;Then, by the Kazakhstan of the Hash codes of the key frame of video to be detected and the sudden and violent probably key frame of video to prestore
Uncommon code matching determines whether video to be detected is to fear video cruelly.Structured analysis, extraction are carried out to video to be detected in the present invention
Go out key frame, realization reaches good balance between the accuracy and speed of Shot Detection;Hash codes using key frame with prestore
Hash codes compare, can quickly judge video to be detected whether be include video;And the Hash codes occupied space to prestore
Small, retrieval rate is fast, therefore, the present invention can quickly, accurately identify cruelly probably video.
Description of the drawings
Fig. 1 is that this application can be applied to exemplary system architecture figures therein;
Fig. 2 is the flow diagram of quick sudden and violent probably one embodiment of video frequency identifying method based on comparison of the application;
Fig. 3 is according to the net of Hash network model in quick sudden and violent probably video frequency identifying method embodiment of the application based on comparison
Network structural schematic diagram;
Fig. 4, the application example flow diagram of the quick sudden and violent probably video frequency identifying method based on comparison of the application.
Specific implementation mode
The preferred embodiment of the present invention described with reference to the accompanying drawings.It will be apparent to a skilled person that this
A little embodiments are used only for explaining the technical principle of the present invention, it is not intended that limit the scope of the invention.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase
Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 shows the exemplary of the quick sudden and violent probably video frequency identifying method embodiment based on comparison that can apply the application
System architecture schematic diagram.
As shown in Figure 1, system architecture may include terminal device 101, network 102 and server 103.Network 102 to
The medium of communication link is provided between terminal device 101 and server 103.Network 102 may include various connection types, example
Such as wired, wireless communication link or fiber optic cables.
User can be interacted by network 102 with server 103 with using terminal equipment 101, to receive or send message etc..
Various telecommunication customer end applications can be installed on terminal device 101, for example, web browser applications, video tour, on video
Pass class application, social platform software etc..
Terminal device 101 can be the various electronic equipments for having display screen and video tour or video being supported to upload,
Including but not limited to smart mobile phone, tablet computer, pocket computer on knee and desktop computer etc..
Server 103 can be to provide the server of various services, such as the video uploaded to terminal device 101 wraps
Include the video processing service device or application platform of identification.Video processing service device pair can be set with each terminal of its network connection
The standby video data uploaded carries out the processing such as analyzing, and handling result (such as video fears recognition result cruelly) is fed back to terminal and is set
Standby or third party uses.
It should be noted that the embodiment of the present application provided based on comparison it is quick cruelly probably video frequency identifying method generally by
Server 103 executes, and correspondingly, the device of method shown in the application can be applied to be generally positioned in server 103.
It should be understood that the number of the terminal device, network and server in Fig. 1 is only schematical.According to realization need
It wants, can have any number of terminal device, network and server.
With continued reference to Fig. 2, a reality of the quick sudden and violent probably video frequency identifying method based on comparison according to the application is shown
Apply the flow of example.The quick sudden and violent probably video frequency identifying method based on comparison, includes the following steps:
Step 201, to for carrying out cruelly, probably the video to be detected of identification carries out shot segmentation to choose above-mentioned to be detected regard
The key frame of frequency.
In the present embodiment, it can apply the electronic equipment of the quick sudden and violent probably video frequency identifying method based on comparison (in such as Fig. 1
Server) or application platform, obtain the video to be detected of pending sudden and violent probably detection.Above-mentioned electronic equipment or application platform are to institute
It obtains video to be detected and carries out shot segmentation respectively to extract the key frame of video to be detected.It to be detected is regarded as an example, above-mentioned
Frequency can be obtained from the terminal device being connect with above-mentioned electronic equipment or application platform, for example, using with above-mentioned server or
After user's uploaded videos of the terminal device of application platform network connection, above-mentioned server or application platform obtain the video conduct
Video to be detected.
Specifically, above-mentioned " described to be checked to choose to the video to be detected progress shot segmentation for carrying out probably identification cruelly
Survey the key frame of video ", including:The histogram for extracting every frame video frame of video to be detected, to the histogram of adjacent video frames
Comparison in difference is carried out, with the shot boundary of the above-mentioned video to be detected of determination;According to identified shot boundary, choose above-mentioned to be checked
The start frame and/or end frame of survey each camera lens of video are as key frame.Histogram of the said extracted per frame video frame, can be ash
Spend histogram or color histogram.It, can be by each camera lens after i.e. by Video segmentation to be detected at a series of camera lens
The key frame of first frame or last frame as camera lens;It can also be using first frame and last frame as key frame.
Step 202, video identification model is feared cruelly using what is built in advance, each key frame of above-mentioned video to be detected is carried out
Hash codes operation obtains the Hash codes of each key frame.
In the present embodiment, based on multiple key frames of the video to be detected of selected taking-up in step 201, above-mentioned electronic equipment
Or application platform carries out operation using the Hash network model built in advance, determines the Hash codes of each key frame.Here, above-mentioned sudden and violent
Probably video identification model can be depth convolutional neural networks model, such as can be Siamese network models, utilize
Siamese network models add the Hash operation of designed Hash loss completion key frame of video to be detected.It is above-mentioned to fear video cruelly
Identification model is based on Hash network struction, and input is video frame, exports the Hash codes of the video frame to be inputted.
Above-mentioned probably video identification model determines that key frame Hash codes can be that the frame picture that will be inputted judges cruelly, profit
It is run with the optimization of depth convolutional neural networks, completes inputted key frame (picture) Hash operation.It is above-mentioned that probably video is known cruelly
Other model can using the feature of key frame carry out operation, the feature of key frame can be include color, texture, the reflections such as structure
The static nature of the information such as background, environment, leading role's appearance;And including motion amplitude, direction, leading role in the reflecting videos such as frequency
Moving situation behavioral characteristics.Using the features described above of key frame, the Hash codes of key frame are determined.
Step 203, by the Hash codes of each above-mentioned key frame respectively compared with the sudden and violent probably Hash codes of the video frame of video that prestore,
Determine video frame similar with each above-mentioned key frame.
In the present embodiment, it is based in step 202 utilizing the sudden and violent probably obtained video to be detected of video identification model calculation
Key frame Hash codes, above-mentioned electronic equipment or application platform to be detected regard compared with the Hash codes to prestore with determination is above-mentioned
Whether the key frame of frequency is similar to the video frame of video is feared cruelly.The above-mentioned Hash codes to prestore can be the sudden and violent probably video frame of video
Hash codes.
Here, the above-mentioned Hash codes to prestore obtain in the following way:It is extracted from video library first and fears video cruelly, so
Afterwards, video is feared cruelly all offline or in line extraction key video sequence frame to what is extracted;Finally, the key video sequence that will be extracted
Frame is input to fears operation in video identification model cruelly based on Hash network, obtains the Hash codes for fearing video cruelly, and will be acquired
Cruelly probably video Hash codes storage.
Above-mentioned Hash codes relatively can be the Hamming distance of the Hash codes and the Hash codes that prestore that compare key frame, and according to the Chinese
Prescribed distance determines whether key frame is similar to the video frame of video is feared cruelly.
In some optional realization methods of the present embodiment, it is above-mentioned " by the Hash codes of each above-mentioned key frame respectively in advance
The Hash codes for the sudden and violent probably video frame of video deposited compare, and determine video frame similar with each above-mentioned key frame ", including:It will be each
The Hash codes of above-mentioned key frame are respectively compared with the probably Hash codes of the video frame of video sudden and violent in video library;Calculate above-mentioned key frame
Hash codes and above-mentioned video frame Hash codes Hamming distance;By key of the above-mentioned Hamming distance radius in range of set value
Frame and video frame are confirmed as similar frame.Specifically, two frame pictures of the Hamming distance radius within 2 can be confirmed as similar
Frame.
Step 204, similar frame number is counted, if the number of video frame similar with each above-mentioned key frame is more than setting threshold
Value, it is determined that above-mentioned video to be detected is to fear video cruelly.
In the present embodiment, it in above-mentioned steps 203, determines and the sudden and violent probably video in above-mentioned sudden and violent probably video database
The similar key frame of video frame counts key frame similar with the video frame in above-mentioned sudden and violent probably video in above-mentioned video to be detected
Number can then determine that the video to be detected is to fear video cruelly if the number is more than the threshold value of setting.Specifically, if waited for
Detection video has 3 frames and the above key frame of 3 frames and probably probably the video frame of video is similar cruelly in video library cruelly, then confirms that this is to be detected
Video is to fear video cruelly.
It is above-mentioned that video identification model is feared based on Hash network cruelly in some optional realization methods of the present embodiment,
Its training method is:Classify to preset training samples pictures, is divided into positive sample data and negative sample data, wherein above-mentioned
Positive sample data are feared to fear picture with sudden and violent to be sudden and violent, and above-mentioned negative sample data are to fear cruelly and non-sudden and violent probably picture;Adjust above-mentioned training sample
The size of this picture, from the above-mentioned training after adjustment with intercepting the region being sized in samples pictures at random and carry out sample standard deviation
Value processing;Using initially probably to treated, picture is trained video identification model cruelly, obtain cruelly fearing based on Hash network
Video identification model.Specifically, training can be divided into two groups with data:Positive sample data and negative sample data;Wherein, positive sample
Notebook data can be probably to fear picture with sudden and violent cruelly, and the label of positive sample data is set as 1, and negative sample data can be feared cruelly to fear with non-to be sudden and violent
The label of picture, negative sample data is set as 0;So that the Hash codes feared between video cruelly are similar as possible, non-probably video is feared with sudden and violent cruelly
The Hash codes of video are mutually remote as possible.
Adjust above-mentioned training samples pictures, the size of above-mentioned training samples pictures be adjusted to 256*256, then with
Machine intercepts the region of 227*227 sizes, and subtracts all sample averages as treated samples pictures, can be directly inputted to
Initial Hash network model is trained.Above-mentioned sample average is the average value of the samples pictures all pixels point;Subtract sample
After this mean value, then it is trained and tests to improve trained speed and measuring accuracy.
By a pair of of picture of above-mentioned positive sample data (the first sudden and violent probably picture and the second sudden and violent probably picture) or negative sample data
A pair of of picture (frame is that probably picture, a frame are non-sudden and violent probably picture cruelly) is input to initial Hash network model and is trained.
In some optional realization methods of the present embodiment, the network structure packet of above-mentioned initial sudden and violent probably video identification model
Input layer, convolutional layer and full articulamentum are included, the schematic network structure of Hash network model is illustrated in figure 3.Wherein, first layer
For input layer, the second layer to layer 6 is convolutional layer, layer 7 to the 9th layer be full articulamentum.Wherein, defeated in above-mentioned input layer
The training samples pictures that enter that treated, above-mentioned training samples pictures are the picture of two frame RGB triple channels.The above-mentioned second layer is extremely
The convolutional layer of layer 6 uses conv1-conv5 to indicate in figure 3;The full articulamentum of above-mentioned layer 7 to the 9th layer, in Fig. 3
It is middle to be indicated using fc1-fc3;Loss function (loss) in above-mentioned full articulamentum has:" power of having any different
And the two major features of " nearly binary-coding (Binary-like) " (Discriminative) ".
Above-mentioned convolutional layer receives the output of preceding layer, this layer after process of convolution after the activation of the activation primitive of this layer it is defeated
Go out;Above-mentioned full articulamentum receives the output of preceding layer, is exported after the activation of the activation primitive of this layer after process of convolution in this layer.
Specifically:
The above-mentioned second layer is convolutional layer, shares 64 convolution kernels, and each convolution kernel size is 11 × 11, and convolution step-length is 4,
Padding=0, connection active coating, down-sampling layer and normalization layer after the characteristic pattern of output.Active coating activation primitive uses ReLU
Function.Sample level sample mode is maximum value sampling, and sampling core is 3 × 3, step-length 2.Normalize the LRN normalization that layer uses
Method, core size are set as 0.00001, beta for 5, alpha and are set as 0.75.Wherein, alpha is zoom factor, and beta is to refer to
It is several.The second layer obtains the output of first layer, and output is C after process of convolution1, C1It is input to down-sampling layer and obtains P1, P1It is input to
Active coating obtains A1, A1It is input to normalization layer and obtains L1, finally export L1To third layer.
Third layer is convolutional layer, shares 256 convolution kernels, and each convolution kernel size is 5 × 5, and convolution step-length is 1,
Padding=2, connection active coating, down-sampling layer and normalization layer after the characteristic pattern of output.Active coating activation primitive uses ReLU
Function.Sample level sample mode is maximum value sampling, and sampling core is 3 × 3, step-length 2.Normalize the LRN normalization that layer uses
Method, core size are set as 0.00001, beta for 5, alpha and are set as 0.75.Third layer obtains the output of the second layer, at convolution
Output is C after reason2, C2It is input to down-sampling layer and obtains P2, P2It is input to active coating and obtains A2, A2Normalization layer is input to obtain
L2, finally export L2To the 4th layer.
4th layer is convolutional layer, shares 256 convolution kernels, and each convolution kernel size is 3 × 3, and convolution step-length is 1,
Padding=1 connects active coating after the characteristic pattern of output.Active coating activation primitive uses ReLU functions.4th layer of acquisition third
The output of layer, output is C after process of convolution3, C3It is input to active coating and obtains A3, finally export A3To layer 5.
Layer 5 is convolutional layer, shares 256 convolution kernels, and each convolution kernel size is 3 × 3, and convolution step-length is 1,
Padding=1 connects active coating after the characteristic pattern of output.Active coating activation primitive uses ReLU functions.Layer 5 obtains the 4th
The output of layer, output is C after process of convolution4, C4It is input to active coating and obtains A4, finally export A4To layer 6.
Layer 6 is convolutional layer, shares 256 convolution kernels, and each convolution kernel size is 3 × 3, and convolution step-length is 1,
Padding=1, connection active coating, down-sampling layer after the characteristic pattern of output.Active coating activation primitive uses ReLU functions.Sampling
Layer sample mode is maximum value sampling, and sampling core is 3 × 3, step-length 2.Layer 6 obtains the output of layer 5, after process of convolution
Output is C5, C5It is input to down-sampling layer and obtains P5, P5It is input to active coating and obtains A5, finally export A5To layer 7.
Layer 7 is full articulamentum, and it is 1 × 1 to have 4096 convolution kernels, each convolution kernel size, step-length 1, the spy of output
Active coating is connected after sign figure.Active coating activation primitive uses ReLU functions.Layer 7 obtains the output of layer 6, after process of convolution
Output is C6, C6It is input to active coating and obtains A6, finally export A6To the 8th layer.
8th layer is full articulamentum, and it is 1 × 1 to have 4096 convolution kernels, each convolution kernel size, step-length 1, the spy of output
Active coating is connected after sign figure.Active coating activation primitive uses ReLU functions.The output of 8th layer of acquisition layer 7, after process of convolution
Output is C7, C7It is input to active coating and obtains A7, finally export A7To last one layer.
9th layer is full articulamentum, and convolution kernel number Hash code length as needed determines that each convolution kernel size is 1 × 1,
Step-length is 1, and Hash loss layer is connected after the characteristic pattern of output.Hash loss layer uses hash function.9th layer obtains the 8th layer
Output, output is C after process of convolution8, C8It is input to the Hash two-value code (b of Hash loss layer output sample pairi,1,bi,2)。
All include activation primitive in above layers, wherein the activation primitive of the second layer to the 8th layer is:
Wherein, ReLU (x) is activation primitive, and x is the output after this layer of convolution.
It is above-mentioned that initially probably the 9th layer of activation primitive of the network structure of video identification model is cruelly:
Wherein, δ (x) is to bi,jSeek the result that local derviation is later.
Training is above-mentioned, and probably the loss function of video identification model is cruelly:
Wherein, whether yi indicates sample to being similar, i.e. yi=1 two samples of expression are similar, otherwise dissimilar;It is the Euclidean distance between two sample two-value codes of sample centering;|||bi,1-1|||1、|||bi,2-1|||1It is sample
The manhatton distance L of this two-value code and unit matrixrBe loss function m (m > 0) it is marginal threshold parameter, α is zoom factor,
bi,1And bi,2For the Hash codes of sample 1 and sample 2, N is training sample to sum, and k is the dimension of Hash codes.
As an example, with reference to figure 4, Fig. 4 shows the quick sudden and violent probably video identification schematic diagram based on comparison.As shown in Figure 4,
On the one hand, the key frame for extracting sudden and violent probably video from video database in advance, using probably the generation of video identification model is each cruelly
The Hash codes of key frame.On the other hand, the key frame for extracting video to be detected generates each key frame using Hash network model
Hash codes.Then the Hamming distance of the Hash codes of key frame of video more to be detected and the sudden and violent probably Hash codes of key frame of video.
Two frame pictures of the Hamming distance radius within 2 are confirmed as similar frame.Finally, if video to be detected has 3 frames and 3 frames or more
Key frame is similar with key frame of video is feared in video library cruelly, then it is assumed that the video is to fear video cruelly.
The method that the above embodiments of the present application are provided with sudden and violent by the Hash codes of key frame of video to be detected by fearing video
The Hash codes of key frame match, and confirm the similar frame of key frame of video to be detected, according in video to be detected with video database
The number of the similar key frame of middle key frame confirms whether video to be detected is probably video cruelly.It is closed using shot segmentation extraction video
Key frame realizes and reaches good balance between the accuracy and speed of Shot Detection;Using key frame Hash codes and prestore
Hash codes compare, can quickly judge video to be detected whether be include video;And the Hash codes to prestore occupy little space,
Retrieval rate is fast;The Hash codes of key frame can be accurately and rapidly obtained using Hash network model;Therefore, using the present invention
The method of offer can quickly, accurately identify cruelly probably video.
So far, it has been combined preferred embodiment shown in the drawings and describes technical scheme of the present invention, still, this field
Technical staff is it is easily understood that protection scope of the present invention is expressly not limited to these specific implementation modes.Without departing from this
Under the premise of the principle of invention, those skilled in the art can make the relevant technologies feature equivalent change or replacement, these
Technical solution after change or replacement is fallen within protection scope of the present invention.
Claims (10)
1. a kind of quick sudden and violent probably video frequency identifying method based on comparison, which is characterized in that the method includes:
To for carrying out probably identifying that video to be detected carries out shot segmentation to choose the key frame of the video to be detected cruelly;
Video identification model is feared cruelly using what is built in advance, and Hash codes operation is carried out to each key frame of the video to be detected,
Obtain the Hash codes of each key frame;Described probably video identification model is based on Hash network struction cruelly, and input is video frame,
Output is the Hash codes of the video frame inputted;
The Hash codes of each key frame are determined and each institute compared with the sudden and violent probably Hash codes of the video frame of video that prestore respectively
State the similar video frame of key frame;
The number of similar frame is counted, if the number of video frame similar with each key frame is more than given threshold, it is determined that
The video to be detected is to fear video cruelly.
2. it is according to claim 1 based on comparison it is quick cruelly probably video frequency identifying method, which is characterized in that " to be used for into
The video to be detected of the sudden and violent probably identification of row carries out shot segmentation to choose the key frame of the video to be detected ", including:
The histogram for extracting every frame video frame of the video to be detected carries out comparison in difference to the histogram of adjacent video frames,
With the shot boundary of the determination video to be detected;
According to identified shot boundary, chooses the start frame of each camera lens of video to be detected and/or end frame is used as and closes
Key frame.
3. the quick sudden and violent probably video frequency identifying method according to claim 1 based on comparison, which is characterized in that " will be each described
The Hash codes of key frame compared with the probably Hash codes of the video frame of video sudden and violent with what is prestored, are determined and each key frame phase respectively
As video frame ", including:
Compared with the Hash codes of each key frame respectively probably Hash codes of the video frame of video sudden and violent in video library;
Calculate the Hamming distance of the Hash codes of the key frame and the Hash codes of the video frame;
Key frame of the Hamming distance radius in range of set value and video frame are confirmed as similar frame.
4. the quick sudden and violent probably video frequency identifying method according to claim 3 based on comparison, which is characterized in that described probably to regard cruelly
Frequency identification model, training method:
Classify to preset training samples pictures, is divided into positive sample data and negative sample data;Wherein, the positive sample data
To fear to fear picture with sudden and violent cruelly, the negative sample data are to fear cruelly and non-sudden and violent probably picture;
The size for adjusting the training samples pictures, interception setting is big at random from the training samples pictures after adjustment
Small region simultaneously carries out sample average processing;
Using initially probably to treated, picture is trained video identification model cruelly, obtain fearing video cruelly based on Hash network
Identification model.
5. the quick sudden and violent probably video frequency identifying method according to claim 4 based on comparison, which is characterized in that described initial sudden and violent
Probably the network structure of video identification model includes input layer, convolutional layer and full articulamentum, wherein and first layer is input layer, second
Layer to layer 6 be convolutional layer, layer 7 to the 9th layer be full articulamentum.
6. the quick sudden and violent probably video frequency identifying method according to claim 5 based on comparison, which is characterized in that described in training
Cruelly probably in video identification model, input is through sample average treated training samples pictures in the input layer.
7. the quick sudden and violent probably video frequency identifying method according to claim 5 based on comparison, which is characterized in that the convolutional layer
Receive preceding layer output, this layer after process of convolution through the activation primitive of this layer activation after export;The full articulamentum connects
Receive preceding layer output, this layer after process of convolution through the activation primitive of this layer activation after export.
8. the quick sudden and violent probably video frequency identifying method according to claim 7 based on comparison, which is characterized in that described initial sudden and violent
Probably the activation primitive of the second layer of the network structure of video identification model to the 8th layer is:
Wherein, ReLU (x) is activation primitive, and x is the output after this layer of convolution.
9. the quick sudden and violent probably video frequency identifying method according to claim 7 based on comparison, which is characterized in that described initial sudden and violent
Probably the 9th layer of activation primitive of the network structure of video identification model is:
Wherein, δ (x) is to bi,jSeek the result that local derviation is later.
10. according to any quick sudden and violent probably video frequency identifying method based on comparison of claim 4 to 9, which is characterized in that instruction
Practicing the sudden and violent loss function for fearing video identification model is:
s.t. bi,j∈{-1,+1}k,i∈{1,...,N},j∈{1,2}
Wherein, yiIndicate sample to whether being similar, i.e. yi=1 two samples of expression are similar, otherwise dissimilar;It is the Euclidean distance between two sample two-value codes of sample centering;|||bi,1-1|||1、|||bi,2-1|||1It is sample
The manhatton distance L of this two-value code and unit matrixrBe loss function m (m > 0) it is marginal threshold parameter, α is zoom factor,
bi,1And bi,2For the Hash codes of sample 1 and sample 2, N is the sum of training sample pair, and k is the dimension of Hash codes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810366397.4A CN108734106B (en) | 2018-04-23 | 2018-04-23 | Rapid riot and terrorist video identification method based on comparison |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810366397.4A CN108734106B (en) | 2018-04-23 | 2018-04-23 | Rapid riot and terrorist video identification method based on comparison |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108734106A true CN108734106A (en) | 2018-11-02 |
CN108734106B CN108734106B (en) | 2021-01-05 |
Family
ID=63939718
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810366397.4A Active CN108734106B (en) | 2018-04-23 | 2018-04-23 | Rapid riot and terrorist video identification method based on comparison |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108734106B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109785214A (en) * | 2019-03-01 | 2019-05-21 | 宝能汽车有限公司 | Safety alarming method and device based on car networking |
CN109918537A (en) * | 2019-01-18 | 2019-06-21 | 杭州电子科技大学 | A kind of method for quickly retrieving of the ship monitor video content based on HBase |
CN110796182A (en) * | 2019-10-15 | 2020-02-14 | 西安网算数据科技有限公司 | Bill classification method and system for small amount of samples |
CN111078941A (en) * | 2019-12-18 | 2020-04-28 | 福州大学 | Similar video retrieval system based on frame correlation coefficient and perceptual hash |
CN112395457A (en) * | 2020-12-11 | 2021-02-23 | 中国搜索信息科技股份有限公司 | Video to-be-retrieved positioning method applied to video copyright protection |
CN112861976A (en) * | 2021-02-11 | 2021-05-28 | 温州大学 | Sensitive image identification method based on twin graph convolution hash network |
CN114724074A (en) * | 2022-06-01 | 2022-07-08 | 共道网络科技有限公司 | Method and device for detecting risk video |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103744973A (en) * | 2014-01-11 | 2014-04-23 | 西安电子科技大学 | Video copy detection method based on multi-feature Hash |
CN105718861A (en) * | 2016-01-15 | 2016-06-29 | 北京市博汇科技股份有限公司 | Method and device for identifying video streaming data category |
WO2018017566A1 (en) * | 2016-07-18 | 2018-01-25 | The Regents Of The University Of Michigan | Hash-chain based sender identification scheme |
-
2018
- 2018-04-23 CN CN201810366397.4A patent/CN108734106B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103744973A (en) * | 2014-01-11 | 2014-04-23 | 西安电子科技大学 | Video copy detection method based on multi-feature Hash |
CN105718861A (en) * | 2016-01-15 | 2016-06-29 | 北京市博汇科技股份有限公司 | Method and device for identifying video streaming data category |
WO2018017566A1 (en) * | 2016-07-18 | 2018-01-25 | The Regents Of The University Of Michigan | Hash-chain based sender identification scheme |
Non-Patent Citations (7)
Title |
---|
JING ZHANG等: ""Image Copy Detection Based on Convolutional Neural Networks"", 《CCPR 2016: PATTERN RECOGNITION》 * |
LI LI等: "《2010 20th International Conference on Pattern Recognition》", 7 October 2010 * |
XIANGLIN ZENG等: "《2008 IEEE International Conference on Multimedia and Expo》", 26 August 2008 * |
YANG DU等: "《2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)》", 9 November 2017 * |
张惠凡等: ""基于卷积神经网络的鸟类视频图像检索研究"", 《科研信息化技术与应用》 * |
彭天强等: ""基于深度卷积神经网络和二进制哈希学习的图像检索方法"", 《电子与信息学报》 * |
王媛媛等: ""有害音视频一致性检测方法的研究与实现"", 《中国人民公安大学学报(自然科学版)》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109918537A (en) * | 2019-01-18 | 2019-06-21 | 杭州电子科技大学 | A kind of method for quickly retrieving of the ship monitor video content based on HBase |
CN109918537B (en) * | 2019-01-18 | 2021-05-11 | 杭州电子科技大学 | HBase-based rapid retrieval method for ship monitoring video content |
CN109785214A (en) * | 2019-03-01 | 2019-05-21 | 宝能汽车有限公司 | Safety alarming method and device based on car networking |
CN110796182A (en) * | 2019-10-15 | 2020-02-14 | 西安网算数据科技有限公司 | Bill classification method and system for small amount of samples |
CN111078941A (en) * | 2019-12-18 | 2020-04-28 | 福州大学 | Similar video retrieval system based on frame correlation coefficient and perceptual hash |
CN112395457A (en) * | 2020-12-11 | 2021-02-23 | 中国搜索信息科技股份有限公司 | Video to-be-retrieved positioning method applied to video copyright protection |
CN112395457B (en) * | 2020-12-11 | 2021-06-22 | 中国搜索信息科技股份有限公司 | Video to-be-retrieved positioning method applied to video copyright protection |
CN112861976A (en) * | 2021-02-11 | 2021-05-28 | 温州大学 | Sensitive image identification method based on twin graph convolution hash network |
CN112861976B (en) * | 2021-02-11 | 2024-01-12 | 温州大学 | Sensitive image identification method based on twin graph convolution hash network |
CN114724074A (en) * | 2022-06-01 | 2022-07-08 | 共道网络科技有限公司 | Method and device for detecting risk video |
Also Published As
Publication number | Publication date |
---|---|
CN108734106B (en) | 2021-01-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108734106A (en) | Quick sudden and violent probably video frequency identifying method based on comparison | |
TWI773189B (en) | Method of detecting object based on artificial intelligence, device, equipment and computer-readable storage medium | |
US20230116801A1 (en) | Image authenticity detection method and device, computer device, and storage medium | |
CN111886842B (en) | Remote user authentication using threshold-based matching | |
CN110276366A (en) | Carry out test object using Weakly supervised model | |
CN108399665A (en) | Method for safety monitoring, device based on recognition of face and storage medium | |
US20180034852A1 (en) | Anti-spoofing system and methods useful in conjunction therewith | |
CN107169458B (en) | Data processing method, device and storage medium | |
CN107093066A (en) | Service implementation method and device | |
TW202026948A (en) | Methods and devices for biological testing and storage medium thereof | |
CN108388878A (en) | The method and apparatus of face for identification | |
CN108416902A (en) | Real-time object identification method based on difference identification and device | |
WO2021175071A1 (en) | Image processing method and apparatus, storage medium, and electronic device | |
CN109308490A (en) | Method and apparatus for generating information | |
WO2022247539A1 (en) | Living body detection method, estimation network processing method and apparatus, computer device, and computer readable instruction product | |
CN110442742A (en) | Retrieve method and device, processor, electronic equipment and the storage medium of image | |
WO2022188315A1 (en) | Video detection method and apparatus, electronic device, and storage medium | |
WO2019056503A1 (en) | Store monitoring evaluation method, device and storage medium | |
CN111898412A (en) | Face recognition method, face recognition device, electronic equipment and medium | |
CN108875582A (en) | Auth method, device, equipment, storage medium and program | |
KR20150128510A (en) | Apparatus and method for liveness test, and apparatus and method for image processing | |
WO2023165616A1 (en) | Method and system for detecting concealed backdoor of image model, storage medium, and terminal | |
CN103609098B (en) | Method and apparatus for being registered in telepresence system | |
CN111241873A (en) | Image reproduction detection method, training method of model thereof, payment method and payment device | |
CN111767840A (en) | Method, apparatus, electronic device and computer-readable storage medium for verifying image |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |