CN111291223A - Four-embryo convolution neural network video fingerprint algorithm - Google Patents

Four-embryo convolution neural network video fingerprint algorithm Download PDF

Info

Publication number
CN111291223A
CN111291223A CN202010072025.8A CN202010072025A CN111291223A CN 111291223 A CN111291223 A CN 111291223A CN 202010072025 A CN202010072025 A CN 202010072025A CN 111291223 A CN111291223 A CN 111291223A
Authority
CN
China
Prior art keywords
video
quadruple
neural network
convolutional neural
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010072025.8A
Other languages
Chinese (zh)
Other versions
CN111291223B (en
Inventor
李新伟
郭辰
杨艺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan University of Technology
Original Assignee
Henan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan University of Technology filed Critical Henan University of Technology
Priority to CN202010072025.8A priority Critical patent/CN111291223B/en
Publication of CN111291223A publication Critical patent/CN111291223A/en
Application granted granted Critical
Publication of CN111291223B publication Critical patent/CN111291223B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Library & Information Science (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a quadruple convolutional neural network video fingerprint algorithm, which comprises the steps of establishing a projection excitation network, establishing a quadruple convolutional neural network video fingerprint algorithm according to the projection excitation network, inputting an established quadruple video sequence into the quadruple convolutional neural network by selecting video data, and carrying out training and performance testing on the quadruple convolutional neural network, wherein the mapping of original video data to a discrete binary code can be realized end to end by the method, the algorithm complexity is simplified, the network parameters are optimized by adopting quadruple loss and quantization error loss together during training, on one hand, the quadruple loss reduces the intra-class variance and increases the inter-class variance, on the other hand, the quantization error loss can reduce the loss of semantic similar information in the real-value characteristic binarization process, and the precision and recall ratio in the aspect of video copy detection are obviously improved, the obtained video fingerprint can meet the compactness and simultaneously keep stronger robustness and uniqueness.

Description

Four-embryo convolution neural network video fingerprint algorithm
Technical Field
The invention relates to the technical field of multimedia information security, in particular to a quadruplet convolutional neural network video fingerprint algorithm.
Background
With the popularity of the internet, computers are undergoing a networked revolution. Various multimedia information technologies related to the multimedia data are started as spring shoots in the late spring, and therefore, the multimedia data are quickly and conveniently used and spread. In the process, while the human life is enriched and the human knowledge is increased by the mass video data, the illegal contents contained in the mass video data directly damage the personal interests of copyright owners in the process of transmission and seriously influence the development of social health. In order to increase the regulatory force for digital media, in recent years, the state issues relevant laws and regulations to effectively protect video copyright and monitor video content. In addition, the introduction and development of related video copy detection technology also manages and limits the distribution of illegal videos.
Compared with images, video data has higher dimensionality and more complex information content, and the video fingerprint technology is gradually developed into an important ring in the field of video copy detection in order to reduce computer memory space occupied by data and accelerate retrieval. Video fingerprints, also called video hashes, are binary sequences that are quantized into a compact representation by extracting features from raw video data and encoding, so as to achieve the purpose of representing a large amount of raw data with a very small amount of data.
In recent years, deep learning, as an emerging machine learning method, has been proven to model raw data in terms of computer vision such as image classification, face recognition, etc., by virtue of its powerful feature extraction capability, and has achieved a success beyond the conventional methods. The key technology of video fingerprinting is how to extract robust and unique video features and efficiently encode the extracted real-valued features. Therefore, researchers continuously try to autonomously learn deep semantic features with good generalization capability from video data by using various neural networks, such as CNN, LSTM, RNN and the like, and a new research enthusiasm of deep learning technology in the field of video copy detection is triggered. The document Wang L, Bao Y and Li h.compact CNN Based Video reproduction for Efficient Video Copy detection. International reference on multimedia modeling. spring International Publishing, pp.576-587,2017. first, features are extracted from densely sampled Video frames using VGGNet network, and then feature dimensions are reduced by Principal Component Analysis (PCA) and sparse coding, and the search effect is further improved. Document Yue, n.l.and p.c.xue.robust and compact video descriptor by deep neural network ieee International Conference on Acoustics, 2017. training a condition generating model and a nonlinear encoder respectively, finally obtaining a robust video description. The method is based on a two-dimensional convolutional network, only static frame space characteristics can be learned, but time relevance among continuous frames is ignored, in order to realize joint learning of video space-time characteristics, documents Li J, Zhang H and Wan W.two-class 3d-cnn classifiers combinations for video copy detection, multimedia Tools and Applications,2018, vol.5 and pp.1-13 propose that a parallel three-dimensional convolutional neural network is adopted to model video space-time information, compared with the method that a two-dimensional convolutional network is used for extracting characteristics, three-dimensional convolutional operation can capture motion information on a video time dimension, and therefore the overall performance is better. Because of the rapid increase of the video data volume, the computer memory is greatly consumed by directly utilizing the learned high-dimensional space-time characteristics to represent the video data, so that the researchers put forward to combine deep learning with the Hash technology and carry out quantitative processing on the high-dimensional real-valued characteristics learned by the neural network to obtain the low-dimensional discrete fingerprint code. The method comprises the following steps of respectively extracting Video spatial features and time features by adopting a convolutional Neural network and a long-time memory network, merging single-frame level features into Video level features through time sequences, and finally obtaining a binary sequence through quantization by using a traditional hash algorithm. The depth feature extraction and quantization coding are integrated into a unified framework, a Binary LSTM unit is proposed for the encoder RNN to generate the Binary code of the video, while the decoder RNN reconstructs the original video frame in forward and reverse order.
Overall, the prior art suffers from the following problems: firstly, most of the video copy detection technologies based on deep learning only utilize a neural network to learn features, and the quantization processing of a real-value feature sequence still adopts a traditional method, so that the overall complexity of the algorithm is high, and the algorithm is not beneficial to being executed on a large-scale data set; secondly, although part of the existing end-to-end depth video fingerprint algorithm integrates the step of quantization coding into the training of the feature extraction network, no matter in a supervised learning mode of a single sample and a label or an unsupervised learning mode formed by a couple of two-tuple or triple sample, the model cannot fully learn the video features with both robustness and uniqueness under the condition of limited sample number, which directly influences the quality of the finally generated fingerprint code; thirdly, for the feature extraction method using the three-dimensional convolutional neural network, the feedforward network with fewer levels and a simple structure is not enough to mine semantic similar information from a complex video structure.
Therefore, there is a need to provide an improved solution to the above-mentioned deficiencies of the prior art.
Disclosure of Invention
The invention provides an end-to-end deep neural network video fingerprint algorithm, which can reduce information loss in a real-valued characteristic quantization process, the precision ratio and the recall ratio of the method in the aspect of video copy detection are obviously improved, and the obtained video fingerprint can meet the compactness and simultaneously keep stronger robustness and uniqueness. .
In order to achieve the above purpose, the invention provides the following technical scheme:
the invention provides a video fingerprint algorithm of a quadruple convolution neural network, which comprises the following steps:
s1, establishing a projection excitation network, and establishing a video fingerprint algorithm of the quadruple convolution neural network according to the projection excitation network, wherein the video fingerprint algorithm specifically comprises the following steps:
s11, the principle of the projection excitation network comprises: inputting a feature map with the size of D multiplied by H multiplied by W multiplied by C;
s12, carrying out global average pooling along three dimensions D, H and W of each characteristic channel respectively to obtain three projection vectors;
s13, excitation operation comprises two layers of convolution with the size of 1 multiplied by 1, a ReLU function and a Sigmoid function are added after each convolution layer for activation, and each characteristic channel generates weight;
s14, weighting the weight generated by the excitation operation on the feature map in the S11 channel by channel to select features;
s15, fusing the projection excitation network into 50 layers of three-dimensional residual error networks to construct a quadruple convolution neural network, wherein the quadruple convolution neural network specifically comprises the following steps:
s151, the first network layer comprises a convolution layer with the core size of 7 x 7 and a maximum pooling layer with the core size of 3 x 3;
s152, the second network layer to the fifth network layer respectively comprise convolution layers formed by overlapping 3, 4, 6 and 3 bottleneck structures, the second 1 multiplied by 1 convolution in each bottleneck structure is nested in the projection excitation network to achieve the purpose of capturing the associated information between the characteristic channels in each residual error unit, and the sixth network layer comprises a 1 multiplied by 4 average pooling layer;
s2, selecting video data and preprocessing the video data;
s3, constructing a quadruple video sequence;
s4, inputting the quadruple video sequence to the quadruple convolutional neural network, and training the quadruple convolutional neural network;
and S5, performing performance test on the trained quadruplet convolutional neural network.
According to the above video fingerprint algorithm of the quadruple convolutional neural network, preferably, the S2 specifically includes:
s201, selecting a video sequence from the behavior identification data set and the video content identification data set;
s202, intercepting the video sequence in the S201, and then carrying out normalization processing;
and S203, dividing the video subjected to the normalization processing in the S202 into a training set and a testing set.
According to the above-mentioned quadruple convolutional neural network video fingerprint algorithm, preferably, the S3 specifically selects three video sequences with different contents from the training set of the S203 at will, performs distortion transformation on one of the video sequences to obtain a group of video pairs, and forms a quadruple video sequence with the other two videos.
According to the above video fingerprint algorithm of the quadruple convolutional neural network, preferably, the S4 specifically includes:
s401, respectively carrying out equal-interval uniform sampling processing on the quadruple video sequence to obtain an input frame image, and then carrying out horizontal random overturning to realize data enhancement;
s402, setting values of relevant parameters in the objective function;
s403, initializing the parameters of the quadruplet convolutional neural network;
s404, training the tetrad convolutional neural network by adopting a random gradient descent method;
s405, training the quadruplet convolutional neural network, and recording a training process.
According to the video fingerprint algorithm of the quadruple convolutional neural network, preferably, the trained quadruple convolutional neural network is subjected to performance test, and the performance test specifically comprises the following steps:
s501, respectively applying analog distortion transformation to all video sequences of the test set in the S203 to generate copy video sequences;
s502, respectively extracting all video sequences in the test set and fingerprint codes of corresponding copy video sequences according to the trained quadruplet convolutional neural network, and cross-calculating the Hamming distance between the fingerprint code of each video sequence in the test set and the fingerprint code of each copy video sequence;
s503, setting a threshold, comparing the Hamming distance in the S502 with the threshold, judging that the two video sequences form a copy relation when the Hamming distance is smaller than the set threshold, otherwise, judging that the two videos form a non-copy relation;
s504, evaluating the performance of the algorithm by adopting the true-case rate and the false-true-case rate, calculating to obtain a group of data of the true-case rate and the false-true-case rate under each threshold value, and drawing a working characteristic curve;
and S505, respectively carrying out comparison experiments of the same distortion transformation and different parameters and comparison experiments among different algorithms to verify the performance of the video fingerprint algorithm of the quadruple convolution neural network.
According to the above-mentioned quadruple convolutional neural network video fingerprint algorithm, preferably, the quadruple video sequences are v video sequences respectivelya,vp,vn1,vn2Wherein v isaAnd vpHaving a copy relationship, vaAnd vn1And vaAnd vn2Has no copy relationship between vn1And vn2Is completely different.
According to the quadruple convolutional neural network video fingerprint algorithm, preferably, the quadruple video sequence is input to the quadruple convolutional neural network to extract space-time characteristics, high-dimensional characteristics are obtained through convolution and pooling operations, and a k-dimensional real value sequence is obtained through full-connection mapping, wherein the k-dimensional real value sequence is as follows:
(f(va;Θ),f(vp;Θ),f(vn1;Θ),f(vn2;Θ));
wherein f (v; theta) is epsilon to Rk×1And Θ represents the parameters of the tetrad convolutional neural network;
normalizing the k-dimensional real-valued sequence to obtain (f)e(va),fe(vp),fe(vn1),fe(vn2));
Wherein
Figure BDA0002377544190000051
According to the video fingerprint algorithm of the quadruple convolutional neural network, preferably, the normalized k-dimensional real value feature sequence is quantized by adopting a sign function sgn (·) to generate a discrete fingerprint code, wherein the discrete fingerprint code is as follows:
(H(va),H(vp),H(vn1),H(vn2));
wherein
H(v)=[h1,h2,…,hk]T∈{-1,1}k,hi=sign(fe(v)),hi=1iffe(v)≥0,hi=-1iffe(v)<0。
According to the above video fingerprint algorithm of the quadruple convolutional neural network, preferably, the objective function is:
Figure BDA0002377544190000061
wherein the adaptive threshold value
Figure BDA0002377544190000062
Figure BDA0002377544190000063
N is the training batch size, omega1And ω2Are the threshold coefficients, | Θ |, respectively1Is the sum of absolute values of parameters of each part of the model, lambda and mu are respectively balance quantization error loss, and L1 is a hyperparameter of a regularization term.
Preferably, the quadruple video series (f) is obtained by the quadruple convolutional neural network video fingerprint algorithme(va),fe(vp),fe(vn1),fe(vn2) Computing the loss of the quadruple video sequence after the objective function computation, wherein the loss computation comprises gradient computation of the quadruple video sequence loss after normalization processing and gradient computation of the quadruple video sequence loss after quantization;
the gradient calculation method of the loss of the normalized quadruple video sequence comprises the following steps:
Figure BDA0002377544190000064
Figure BDA0002377544190000071
Figure BDA0002377544190000072
Figure BDA0002377544190000073
wherein
Figure BDA0002377544190000074
Figure BDA0002377544190000075
The method for calculating the gradient of the loss of the quantized quadruple video sequence comprises the following steps:
Figure BDA0002377544190000076
Figure BDA0002377544190000077
Figure BDA0002377544190000078
Figure BDA0002377544190000079
compared with the closest prior art, the technical scheme provided by the invention has the following excellent effects:
the invention provides a quadruple convolution neural network video fingerprint algorithm.A quadruple model structure in the algorithm consists of four sub-networks shared by parameters, each sub-network consists of a three-dimensional residual projection excitation network, and on one hand, the three-dimensional residual projection excitation network is used for learning deep space-time characteristics of a video, and on the other hand, a projection excitation module can generate weight for characteristic channels to learn the dependency relationship among the characteristic channels; the three-dimensional projection excitation residual error network is used as a feature extractor of a video copy detection task, and a real-valued fingerprint sequence is obtained in a full-connection mode, so that mapping from original video data to a discrete binary code can be realized end to end, and algorithm complexity is simplified. The method adopts the quadruple loss and the quantization error loss to jointly optimize network parameters during training, so that on one hand, the quadruple loss reduces the intra-class variance and increases the inter-class variance, and the aim of simultaneously learning the video robustness and uniqueness characteristics is fulfilled; on the other hand, the loss of quantization errors can reduce the loss of semantic similar information in the real-value feature binarization process, and the quality of video fingerprints is ensured. The invention combines deep learning and Hash coding, effectively carries out fast retrieval on video copy, can be used for copyright protection and illegal content monitoring of Internet digital video, obviously improves precision ratio and recall ratio in the aspect of video copy detection, and can keep stronger robustness and uniqueness while meeting the compactness of the obtained video fingerprint.
Drawings
FIG. 1 is a schematic flow chart of a video fingerprint algorithm of a quadruplet convolutional neural network according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart illustrating a principle of a projection excitation network in an embodiment of the present invention;
FIG. 3 is a schematic flow chart of a quadruplet convolutional neural network according to an embodiment of the present invention;
FIG. 4 is a graph illustrating the variation of training loss values in an embodiment of the present invention;
FIG. 5 is a graph of training accuracy rate transformation in an embodiment of the present invention;
FIG. 6 is a graph of the same distortion transformation frame loss comparison in an embodiment of the present invention;
FIG. 7 is a graph of frame rate reduction versus frame rate for the same distortion conversion in an embodiment of the present invention;
FIG. 8 is a graph of the same distortion transformation rotation contrast in an embodiment of the present invention;
FIG. 9 is a graph of contrast for translation plus frame rate reduction for the same distortion transformation in an embodiment of the present invention;
FIG. 10 is a comparison graph of homomorphic transform insertion logo in an embodiment of the present invention;
FIG. 11 is a graph of the same distortion transform median filtering plus frame loss comparison in an embodiment of the present invention;
FIG. 12 is a graph of same distortion transformation salt and pepper noise plus frame loss contrast in an embodiment of the present invention;
FIG. 13 is a graph of scaling versus comparing the same distortion transformation in an embodiment of the present invention;
FIG. 14 is a graph of different distortion transformation frame loss versus comparison in an embodiment of the present invention;
FIG. 15 is a graph of frame rate reduction versus distortion conversion in an embodiment of the present invention;
FIG. 16 is a graph of rotational contrast for different distortion transformations in an embodiment of the present invention;
FIG. 17 is a graph of different distortion transformation shifts plus frame rate reduction contrast in an embodiment of the present invention;
FIG. 18 is a graph comparing different distortion transformation insertion logos in an embodiment of the present invention;
FIG. 19 is a graph of different distortion transformed median filters plus frame loss contrast in an embodiment of the present invention;
FIG. 20 is a graph of different distortion transformation salt and pepper noise plus frame loss contrast in an embodiment of the present invention;
FIG. 21 is a graph of scaling versus different distortion transformations in an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely below, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present invention.
The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings. It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
As shown in fig. 1, the present invention provides a video fingerprint algorithm using a quadruple convolutional neural network, where an overall framework of the video fingerprint algorithm is composed of four parts, i.e., an input end, a quadruple convolutional neural network, a fingerprint code generation layer, and a loss layer, where the input end is specifically used for inputting a quadruple video sequence, the quadruple video sequence is an anchor representing a source video, a positive representing a copy video, a negative1 representing a non-copy video 1, and a negative2 representing a non-copy video 2; the four-fetus convolutional neural network comprises four weight-sharing sub-networks, and each sub-network is a novel three-dimensional residual error network of a nested projection excitation block; the fingerprint code generation layer firstly performs average pooling operation on a feedforward network, maps the obtained 2048-dimensional feature vector into a low-bit real value sequence, and then performs binary coding to obtain a discrete fingerprint code; the model is preferably applied to the loss layer, the model-optimized objective function contains improved quadruple loss and quantization error loss, and an L1 regularization term is added.
The invention is an end-to-end structure, which utilizes a convolution network to acquire the time-space information of input original video data, reduces the dimension of the acquired high-dimensional characteristics to a real value sequence of a target length through full connection, and finally obtains quantized video fingerprints; the model optimization process is based on the relation of three pairs of positive and negative samples formed by four tuples, the internal variance is reduced and the inter-class variance is increased through the loss of the four tuples, the common learning of the robustness characteristics of a copy video pair and the uniqueness characteristics of a non-copy video pair is realized, the loss of similarity information during the binarization of a real value sequence can be effectively reduced through a quantization error term, the overfitting condition during training can be avoided through L1 regularization, in the optimization process, a compact fingerprint code meeting the robustness and the uniqueness is generated by a quadruple convolutional neural network through a minimized objective function, and meanwhile, the generated video fingerprint can reversely guide the learning of the quadruple convolutional neural network, the two are mutually promoted and mutually fed back.
As shown in fig. 2-3, the specific construction steps of the quadruple convolutional neural network in the quadruple convolutional neural network video fingerprint algorithm in the present invention are as follows:
s1, establishing a projection excitation network principle, fusing the projection excitation network principle according to a projection excitation network module in the image segmentation field, and establishing a quadruple convolution neural network, wherein the projection excitation network principle specifically comprises the following steps:
s11, the principle of the projection excitation network comprises the following steps: a feature map of size D × H × W × C is input.
And S12, inputting the feature map, and then performing projection operation, wherein the projection operation is to perform global average pooling respectively along three dimension directions D, H and W of each feature channel to obtain three projection vectors, and ⊕ indicates addition operation for fusing spatial information in each dimension direction.
And S13, performing excitation operation after the projection operation is finished, wherein the excitation operation comprises two layers of convolution with the size of 1 multiplied by 1, the first layer of convolution is activated by adding a ReLU function, the second layer of convolution is activated by adding a Sigmoid function, and the activated feature graph can generate weight for each feature channel so as to learn the interdependence relationship among the channels.
And S14, weighting the weight output by the excitation operation on the initial feature map channel by channel, selecting features according to the current task, improving useful features and suppressing useless features, wherein ⊙ represents dot product operation.
S15, fusing the projection excitation network into a 50-layer three-dimensional residual error network, and constructing a quadruple convolution neural network by taking the 50-layer three-dimensional residual error network as a basic structure through the three-dimensional projection excitation residual error network model, wherein the quadruple convolution neural network specifically comprises the following steps:
s151, the first network layer Conv1 is input with a convolution layer with a core size of 7 × 7 × 7 and a maximum pooling layer with a core size of 3 × 3 × 3.
S152, a second network layer Conv2_ x, a third network layer Conv3_ x, a fourth network layer Conv4_ x and a fifth network layer Conv5_ x respectively represent convolutional layers formed by overlapping 3, 4, 6 and 3 bottleneck structures, a projection excitation block is nested after convolution of the second 1 x 1 in each bottleneck structure, the purpose of capturing correlation information between feature channels in each residual error unit is achieved, and the sixth layer is an average pooling layer with the kernel size of 1 x 4.
And S153, outputting the video fingerprint length obtained by 16 bit numbers, so that the high-dimensional features extracted by the three-dimensional feature projection excitation residual error network are mapped into a 16-dimensional real value sequence, namely, neurons of a full connection layer in the fingerprint code generation layer comprise 16 nodes.
The video fingerprint algorithm flow of the quadruple convolutional neural network specifically comprises the following steps:
1) constructing four-tuple video sequence, wherein the four-tuple video sequence is v respectivelya,vp,vn1,vn2Wherein v isaAnd vpHaving a copy relationship, vaAnd vn1And vaAnd vn2Has no copy relationship between vn1And vn2Is completely different.
2) V inputa,vp,vn1,vn2To a quadruplet convolution neural network to extract space-time characteristics, obtaining high-dimensional characteristics through a series of convolution and pooling operations and mapping the high-dimensional characteristics to a k-dimensional real value sequence (f (v) through full connectiona;Θ),f(vp;Θ),f(vn1;Θ),f(vn2(ii) a Θ)), and the k-dimensional real-valued sequence is a real-valued feature with lower dimension obtained by dimension reduction compression of a high-dimensional feature, wherein f (v; theta) is epsilon to Rk×1And Θ represents model parameters.
3) Normalizing the k-dimensional real value sequence to obtain (f)e(va),fe(vp),fe(vn1),fe(vn2) Therein), wherein
Figure BDA0002377544190000111
4) And generating the discrete fingerprint code (H (v) ·) by quantization by adopting a sign function sgn (·)a),H(vp),H(vn1),H(vn2) Wherein h (v) ═ h1,h2,…,hk]T∈{-1,1}k,hi=sign(fe(v)),hi=1iffe(v)≥0,hi=-1iffe(v)<0。
5) Designing an optimized objective function of the quadruple convolutional neural network, wherein the objective function is an optimized calculation target of the quadruple convolutional neural network, network parameters are continuously adjusted to an optimal state according to the optimized objective function, and the designed and optimized objective function is as follows:
Figure BDA0002377544190000112
wherein the adaptive threshold value
Figure BDA0002377544190000113
Figure BDA0002377544190000114
N is the training batch size, omega1And ω2Are the threshold coefficients, | Θ |, respectively1And (3) calculating loss according to an objective function and updating each parameter of the quadruple convolutional neural network through back propagation, wherein the sum of absolute values of each part of the model is defined as lambda and mu, and the lambda and the mu are respectively balance quantization error loss, and L1 is a hyper-parameter of a regularization term.
6) The method comprises the steps of carrying out gradient calculation on the loss of a novel quadruple video sequence in an objective function, wherein the novel quadruple video sequence is the quadruple loss provided by the embodiment of the invention, specifically, replacing a max (0, question) function with an inconductable smooth continuous function ln (1+ exp (question)) with another infinitely conductive smooth continuous function in the original quadruple loss function Quadrapleltlos (quadruple loss), and conveniently updating parameters by using a gradient descent method, wherein the loss of the novel quadruple video sequence is related to fe(va),fe(vp),fe(vn1) And fe(vn2) The gradient calculation method comprises the following steps:
Figure BDA0002377544190000121
Figure BDA0002377544190000122
Figure BDA0002377544190000123
Figure BDA0002377544190000124
wherein
Figure BDA0002377544190000125
Figure BDA0002377544190000126
7) And carrying out gradient calculation on the quantization error loss in the target function, wherein the target function is the target function of the whole quadruple convolutional neural network and comprises quadruple loss, quantization error loss and a regularization term, and the quantization error loss in the target function is related to fe(va),fe(vp),fe(vn1) And fe(vn2) The gradient calculation method comprises the following steps:
Figure BDA0002377544190000131
Figure BDA0002377544190000132
Figure BDA0002377544190000133
Figure BDA0002377544190000134
the video distortion copy can be effectively detected through the depth video fingerprint algorithm, the generated video fingerprint obviously reduces the fingerprint coding length while ensuring the robustness and uniqueness, and the matching efficiency is fundamentally improved.
In order that the invention may be better understood, it will now be further illustrated by the following examples.
S2, selecting experimental data and preprocessing the experimental data, wherein the preprocessing comprises the following steps:
s201, selecting 4986 video sequences which meet the requirements that the resolution is 320 multiplied by 240, the total number of frames is not less than 100 and the difference of video contents is large from the behavior identification data set UCF-101, the HMDB-51 and the video content identification data set FCVID.
S202, intercepting the first 100 frames of the selected 4986 video sequences to obtain the normalized video with the size of 100 multiplied by 320 multiplied by 240.
S203, dividing all the normalized video sequences into 3986 training sets and 1000 testing sets, wherein the videos in the training sets and the videos in the testing sets are not overlapped with each other.
S3, randomly selecting three video sequences with different contents from the training set, carrying out distortion change processing on one of the video sequences to obtain a group of analog copy video pairs, and forming a quadruple video sequence with the other two video sequences.
S4, training the quadruple convolutional neural network, specifically comprising:
s401, evenly sampling 16 frames at equal intervals for each video sequence, and
Figure BDA0002377544190000135
the four corners and center positions of the video frame are cropped at five spatial scales to obtain a color input frame image with the size of 16 × 112 × 112, and in addition, horizontal random inversion is performed to achieve data enhancement.
S402, setting the value of relevant parameters omega in the objective function1And ω 21 and 0.5, respectively, and λ and μ are 0.01 and 0.001, respectively.
And S403, initializing model parameters before training, migrating the parameters into the quadruple convolutional neural network by using the 3D ResNet-50 network parameters pre-trained on a Kinetics data set, and randomly initializing the parameters, wherein the Kinetics data set is a public video data set related to human behavior recognition and is a recognized standard data set. The videos in the data set are from the optimal rabbit and are divided into 600 categories, each category is at least 600 videos, and each video lasts for about 10 seconds.
S404, training the model by adopting a random gradient descent method, setting the initial learning rate to be 0.01, setting the momentum to be 0.9, setting the weight attenuation parameter to be 0.001, selecting 10000 quadruple video sequences for each epoch, setting the batch size to be 10 according to the size of a computer video memory, and setting the learning rate adjustment strategy to be 0.1 time of the original learning rate reduction of 20000 times per iteration.
S405, respectively training the quadruple video fingerprint algorithm of the nested projection excitation network and the video fingerprint algorithm based on the non-local 3D residual error network under the same data set according to the same training strategy and parameter setting, recording loss and accuracy change conditions in the training process, drawing curves, and drawing results as shown in fig. 4 and 5.
S5, verifying the trained quadruplet convolutional neural network, wherein the verifying step specifically comprises the following steps:
s501, all video sequences in the test set respectively apply 8 analog distortion transforms to generate 34 copy videos, including 3 frame loss videos (randomly discarding 30, 40 and 50 frames), 5 frame rate dropping videos (FPS dropping 2, 4, 6, 8 and 10 per second), 4 rotation videos (rotating 5 degrees, 10 degrees clockwise and rotating 5 degrees and rotating 10 degrees counterclockwise), 3 frame translation and frame rate dropping videos (moving along the coordinate axes (-40, 40), (-60, 60), (-80, 80), and respectively FPS dropping 10 per second), 5 logo insertion videos (inserting position coordinates (40, 60), (60, 80), (80, 100), (100, 120 and 120), 5 median filtering and frame loss videos (templates 9 × 9, 11 × 11, 13 × 13, 15 × 15 and 17 × 17, and randomly discarding 50 frames), 5 salt and pepper noise plus lost frame videos (scale 0.02, 0.04, 0.06, 0.08, 0.10, and randomly dropping 50 frames), 4 scaled videos (multiple 0.5, 0.75, 1.5, 2.0).
S502, extracting fingerprint codes of all video sequences in the test set and fingerprint codes of copy video sequences corresponding to the fingerprint codes according to the trained quadruplet convolutional neural network, and calculating the Hamming distance between the fingerprint code of each video sequence in the test set and the fingerprint code of each copy video sequence in a crossed manner.
S503, setting a threshold value, wherein the threshold value is changed in all Hamming distance ranges, when the Hamming distance between two fingerprint codes is smaller than the set threshold value, judging that two video sequences form a copy relation, otherwise, judging that two videos form a non-copy relation.
And S504, evaluating the performance of the video fingerprint algorithm by adopting the true and false rate.
And S505, respectively carrying out two groups of experiments to verify the performance of the video fingerprint algorithm provided by the invention.
The first group of experiments are comparison experiments of different parameters of the same distortion transformation, the experiment results are shown in fig. 6, fig. 7, fig. 8, fig. 9, fig. 10, fig. 11, fig. 12 and fig. 13, and the curve shown in fig. 6 shows that the performance of the algorithm in the invention is in a descending trend along with the increase of the number of lost frames; the curves shown in fig. 7 indicate that changes in frame rate have little effect on algorithm performance; the curves shown in fig. 8 indicate that the larger the rotation angle, the worse the algorithm performance, and the robustness of the algorithm to counterclockwise rotation is better than that of clockwise rotation, whether clockwise or counterclockwise rotation; the curve shown in fig. 9 shows that the performance of the algorithm is in a descending trend with the distance of the frame translation position in the initial stage, and the performance of the algorithm is in a step-like ascending after the translation position reaches a certain distance; the curves shown in fig. 10 show that the interference rejection of the algorithm is worse as the inserted logo is closer to the center position of the video frame, and the interference rejection of the algorithm is stronger as the logo position is closer to the edge position of the video frame; the curves shown in fig. 11 show that the robustness of the algorithm shows a downward trend as the median filtering strength increases; the curves shown in fig. 12 show that as the intensity of salt and pepper noise increases, the robustness of the algorithm tends to decrease; the curves shown in fig. 13 show that the algorithm performance is in a downward trend as the video zoom factor is increased, and the influence of the zoom factor on the algorithm robustness is significantly greater than the influence of the zoom factor on the algorithm robustness.
The second group of experiments are comparison experiments among different algorithms, four traditional video fingerprint algorithm Structure Graph Models (SGM), Time Information Representative Images (TIRI), gradient direction Centroid (CGO), radial hash (RASH) and a video fingerprint algorithm (NL _ triple) based on a non-local 3D residual error network are selected to perform performance comparison with the algorithm (PE _ Quadrmplet) provided by the invention, and the fingerprint code lengths are all 16 bits. The NL _ triple algorithm and the NL _ triple algorithm are trained and tested under the same data set, the related parameter settings are kept consistent, the experimental results are shown in fig. 14, fig. 15, fig. 16, fig. 17, fig. 18, fig. 19, fig. 20 and fig. 21, the algorithm provided by the invention is obviously superior to the TIRI, CGO and RASH algorithms under the frame loss attack transformation shown in fig. 14, and the overall performance is excellent compared with the SGM algorithm and the deep learning algorithm NL _ triple; fig. 15 shows that under the frame rate down attack transformation, the algorithm proposed by the present invention is slightly inferior to the SGM algorithm and the NL _ Triplet algorithm, but the overall performance is still at a higher level; FIG. 16 shows that under the transformation of the rotation attack, the algorithm proposed by the present invention is the best compared with all the comparison algorithms, and the strong robustness of the algorithm to the rotation distortion is reflected; fig. 17 shows that under the combined attack transformation of frame translation and frame rate reduction, the performance of the algorithm proposed by the present invention is significantly higher than that of the comparison algorithm, and it can be seen that the algorithm has stronger anti-interference capability to the combined distortion of geometry and time domain; FIG. 18 shows that under the transformation of the insertion logo attack, the performance of the algorithm proposed by the present invention is still the best of all algorithms, which shows that the robustness of the algorithm to local distortion is good; fig. 19 shows that under the combined attack transformation of median filtering and frame loss, the performance of the algorithm proposed by the present invention is superior to that of the conventional comparison algorithm, and is comparable to that of the depth algorithm NL _ Triplet, which indicates that the robustness of the algorithm to signal processing class spatial distortion is excellent; fig. 20 shows that under the combined attack transformation of salt-pepper noise and frame loss, the performance of the algorithm proposed by the present invention has better superiority compared with the conventional algorithm, and the strong robustness of the algorithm to the signal processing space distortion is reflected again; under the scaling attack transformation, the algorithm provided by the invention has the best performance compared with the other five comparison algorithms, and the strong robustness of the algorithm to geometric distortion is fully shown in the figure 21.
In conclusion, the invention provides a quadruple convolutional neural network video fingerprint algorithm, a quadruple convolutional neural network video fingerprint algorithm is characterized in that a quadruple model structure consists of four sub-networks shared by parameters, each sub-network consists of a three-dimensional residual projection excitation network, on one hand, the three-dimensional residual projection excitation network is used for video deep space-time feature learning, and on the other hand, a projection excitation module can generate weight for feature channels to learn the dependency relationship among the feature channels; the three-dimensional projection excitation residual error network is used as a feature extractor of a video copy detection task, and a real-valued fingerprint sequence is obtained in a full-connection mode, so that mapping from original video data to a discrete binary code can be realized end to end, and algorithm complexity is simplified. The method adopts the quadruple loss and the quantization error loss to jointly optimize network parameters during training, so that on one hand, the quadruple loss reduces the intra-class variance and increases the inter-class variance, and the aim of simultaneously learning the video robustness and uniqueness characteristics is fulfilled; on the other hand, the loss of quantization errors can reduce the loss of semantic similar information in the real-value feature binarization process, and the quality of video fingerprints is ensured. The method combines deep learning and Hash coding, effectively carries out quick retrieval on the video copy, can be used for copyright protection and illegal content monitoring of Internet digital video, obviously improves precision ratio and recall ratio in the aspect of video copy detection, and can keep stronger robustness and uniqueness while meeting the compactness of the obtained video fingerprint.
The above description is only exemplary of the invention and should not be taken as limiting the invention, as any modification, equivalent replacement, or improvement made within the spirit and principle of the invention is intended to be covered by the appended claims.

Claims (10)

1. A video fingerprint algorithm of a quadruple convolutional neural network, comprising the following steps:
s1, establishing a projection excitation network, and establishing a video fingerprint algorithm of the quadruple convolution neural network according to the projection excitation network, wherein the video fingerprint algorithm specifically comprises the following steps:
s11, the principle of the projection excitation network comprises: inputting a feature map with the size of D multiplied by H multiplied by W multiplied by C;
s12, carrying out global average pooling along three dimensions D, H and W of each characteristic channel respectively to obtain three projection vectors;
s13, excitation operation comprises two layers of convolution with the size of 1 multiplied by 1, a ReLU function and a Sigmoid function are added after each convolution layer for activation, and each characteristic channel generates weight;
s14, weighting the weight generated by the excitation operation on the feature map in the S11 channel by channel to select features;
s15, fusing the projection excitation network into 50 layers of three-dimensional residual error networks to construct a quadruple convolution neural network, wherein the quadruple convolution neural network specifically comprises the following steps:
s151, the first network layer comprises a convolution layer with the core size of 7 x 7 and a maximum pooling layer with the core size of 3 x 3;
s152, the second network layer to the fifth network layer respectively comprise convolution layers formed by overlapping 3, 4, 6 and 3 bottleneck structures, the second 1 multiplied by 1 convolution in each bottleneck structure is nested in the projection excitation network to achieve the purpose of capturing the associated information between the characteristic channels in each residual error unit, and the sixth network layer comprises a 1 multiplied by 4 average pooling layer;
s2, selecting video data and preprocessing the video data;
s3, constructing a quadruple video sequence;
s4, inputting the quadruple video sequence to the quadruple convolutional neural network, and training the quadruple convolutional neural network;
and S5, performing performance test on the trained quadruplet convolutional neural network.
2. The video fingerprint algorithm of the quadruple convolutional neural network according to claim 1, wherein the S2 specifically comprises:
s201, selecting a video sequence from the behavior identification data set and the video content identification data set;
s202, intercepting the video sequence in the S201, and then carrying out normalization processing;
and S203, dividing the video subjected to the normalization processing in the S202 into a training set and a testing set.
3. The video fingerprinting algorithm of the quadruple convolutional neural network of claim 2, wherein the S3 is specifically a video sequence obtained by arbitrarily selecting three video sequences with different contents from the training set of S203, and performing distortion transformation on one of the video sequences to obtain a set of video pairs, and forming a quadruple video sequence with the other two videos.
4. The video fingerprint algorithm of the quadruple convolutional neural network according to claim 3, wherein the S4 specifically comprises:
s401, respectively carrying out equal-interval uniform sampling processing on the quadruple video sequence to obtain an input frame image, and then carrying out horizontal random overturning to realize data enhancement;
s402, setting values of relevant parameters in the objective function;
s403, initializing the parameters of the quadruplet convolutional neural network;
s404, training the tetrad convolutional neural network by adopting a random gradient descent method;
s405, training the quadruplet convolutional neural network, and recording a training process.
5. The video fingerprinting algorithm of the quadruple convolutional neural network according to claim 4, wherein the trained quadruple convolutional neural network is subjected to performance testing, the performance testing specifically comprising:
s501, respectively applying analog distortion transformation to all video sequences of the test set in the S203 to generate copy video sequences;
s502, respectively extracting all video sequences in the test set and fingerprint codes of corresponding copy video sequences according to the trained quadruplet convolutional neural network, and cross-calculating the Hamming distance between the fingerprint code of each video sequence in the test set and the fingerprint code of each copy video sequence;
s503, setting a threshold, comparing the Hamming distance in the S502 with the threshold, judging that the two video sequences form a copy relation when the Hamming distance is smaller than the set threshold, otherwise, judging that the two videos form a non-copy relation;
s504, evaluating the performance of the algorithm by adopting the true-case rate and the false-true-case rate, calculating to obtain a group of data of the true-case rate and the false-true-case rate under each threshold value, and drawing a working characteristic curve;
and S505, respectively carrying out comparison experiments of the same distortion transformation and different parameters and comparison experiments among different algorithms to verify the performance of the video fingerprint algorithm of the quadruple convolution neural network.
6. The quadruple convolutional neural network video fingerprint algorithm of claim 3, wherein the quadruple video sequences are v video sequences respectivelya,vp,vn1,vn2Wherein v isaAnd vpHaving a copy relationship, vaAnd vn1And vaAnd vn2Has no copy relationship between vn1And vn2Is completely different.
7. The quadruple convolutional neural network video fingerprint algorithm according to claim 6, wherein the quadruple video sequence is input into the quadruple convolutional neural network to extract space-time features, high-dimensional features are obtained through convolution and pooling operations, and a k-dimensional real value sequence is obtained through full-connection mapping, wherein the k-dimensional real value sequence is as follows:
(f(va;Θ),f(vp;Θ),f(vn1;Θ),f(vn2;Θ));
wherein f (v; theta) is epsilon to Rk×1And Θ represents the parameters of the tetrad convolutional neural network;
normalizing the k-dimensional real-valued sequence to obtain (f)e(va),fe(vp),fe(vn1),fe(vn2));
Wherein
Figure FDA0002377544180000031
8. The video fingerprinting algorithm of the tetrasticytis convolutional neural network of claim 7, wherein the normalized k-dimensional real-valued feature sequence is quantized with sign function sgn (·) to generate discrete fingerprint code:
(H(va),H(vp),H(vn1),H(vn2));
wherein
H(v)=[h1,h2,…,hk]T∈{-1,1}k,hi=sign(fe(v)),hi=1iffe(v)≥0,hi=-1iffe(v)<0。
9. The video fingerprinting algorithm of the tetrasticytis convolutional neural network of claim 4, wherein the objective function is:
Figure FDA0002377544180000041
wherein the adaptive threshold value
Figure FDA0002377544180000042
Figure FDA0002377544180000043
N is the training batch size, omega1And ω2Are the threshold coefficients, | Θ |, respectively1Is the sum of absolute values of parameters of each part of the model, lambda and mu are respectively balance quantization error loss, and L1 is a hyperparameter of a regularization term.
10. The video fingerprinting algorithm of the quadruple convolutional neural network of claim 9, wherein the quadruple video series (f) ise(va),fe(vp),fe(vn1),fe(vn2) Computing the loss of the quadruple video sequence after the objective function computation, wherein the loss computation comprises gradient computation of the quadruple video sequence loss after normalization processing and gradient computation of the quadruple video sequence loss after quantization;
the gradient calculation method of the loss of the normalized quadruple video sequence comprises the following steps:
Figure FDA0002377544180000044
Figure FDA0002377544180000051
Figure FDA0002377544180000052
Figure FDA0002377544180000053
wherein
Figure FDA0002377544180000054
Figure FDA0002377544180000055
The method for calculating the gradient of the loss of the quantized quadruple video sequence comprises the following steps:
Figure FDA0002377544180000056
Figure FDA0002377544180000057
Figure FDA0002377544180000058
Figure FDA0002377544180000059
CN202010072025.8A 2020-01-21 2020-01-21 Four-embryo convolution neural network video fingerprint method Active CN111291223B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010072025.8A CN111291223B (en) 2020-01-21 2020-01-21 Four-embryo convolution neural network video fingerprint method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010072025.8A CN111291223B (en) 2020-01-21 2020-01-21 Four-embryo convolution neural network video fingerprint method

Publications (2)

Publication Number Publication Date
CN111291223A true CN111291223A (en) 2020-06-16
CN111291223B CN111291223B (en) 2023-01-24

Family

ID=71026653

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010072025.8A Active CN111291223B (en) 2020-01-21 2020-01-21 Four-embryo convolution neural network video fingerprint method

Country Status (1)

Country Link
CN (1) CN111291223B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111813996A (en) * 2020-07-22 2020-10-23 四川长虹电器股份有限公司 Video searching method based on sampling parallelism of single frame and continuous multi-frame

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108985165A (en) * 2018-06-12 2018-12-11 东南大学 A kind of video copy detection system and method based on convolution and Recognition with Recurrent Neural Network
CN110070041A (en) * 2019-04-23 2019-07-30 江西理工大学 A kind of video actions recognition methods of time-space compression excitation residual error multiplication network
GB201908574D0 (en) * 2019-06-14 2019-07-31 Vision Semantics Ltd Optimised machine learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108985165A (en) * 2018-06-12 2018-12-11 东南大学 A kind of video copy detection system and method based on convolution and Recognition with Recurrent Neural Network
CN110070041A (en) * 2019-04-23 2019-07-30 江西理工大学 A kind of video actions recognition methods of time-space compression excitation residual error multiplication network
GB201908574D0 (en) * 2019-06-14 2019-07-31 Vision Semantics Ltd Optimised machine learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
SHIYANG XU ET AL.: "Content-based video fingerprinting method for fast key generation and retrieval", 《IEEE》 *
汪冬冬等: "基于时空深度神经网络的视频指纹算法", 《激光与光电子学进展》 *
郭辰等: "基于非局部3D残差网络的视频指纹算法", 《计算机工程与应用》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111813996A (en) * 2020-07-22 2020-10-23 四川长虹电器股份有限公司 Video searching method based on sampling parallelism of single frame and continuous multi-frame

Also Published As

Publication number Publication date
CN111291223B (en) 2023-01-24

Similar Documents

Publication Publication Date Title
CN111581405B (en) Cross-modal generalization zero sample retrieval method for generating confrontation network based on dual learning
Zhang et al. Depth-wise separable convolutions and multi-level pooling for an efficient spatial CNN-based steganalysis
Cui et al. Identifying materials of photographic images and photorealistic computer generated graphics based on deep CNNs.
CN111723220B (en) Image retrieval method and device based on attention mechanism and Hash and storage medium
Seow et al. A comprehensive overview of Deepfake: Generation, detection, datasets, and opportunities
CN112434553B (en) Video identification method and system based on deep dictionary learning
Zhao et al. Disentangled representation learning and residual GAN for age-invariant face verification
Sungheetha et al. A novel CapsNet based image reconstruction and regression analysis
CN111079514A (en) Face recognition method based on CLBP and convolutional neural network
Ding et al. Noise-resistant network: a deep-learning method for face recognition under noise
Tabares-Soto et al. Digital media steganalysis
Chaudhuri Deep learning models for face recognition: A comparative analysis
Zhong et al. Complementing representation deficiency in few-shot image classification: A meta-learning approach
Xia et al. Domain fingerprints for no-reference image quality assessment
Qin et al. Label enhancement-based multiscale transformer for palm-vein recognition
CN110083734B (en) Semi-supervised image retrieval method based on self-coding network and robust kernel hash
Wang et al. Marginalized denoising dictionary learning with locality constraint
CN111291223B (en) Four-embryo convolution neural network video fingerprint method
CN116383470B (en) Image searching method with privacy protection function
CN113160032A (en) Unsupervised multi-mode image conversion method based on generation countermeasure network
Ma et al. Enhancing the security of image steganography via multiple adversarial networks and channel attention modules
Xie et al. Evading generated-image detectors: A deep dithering approach
Yu et al. ICD-Face: Intra-class Compactness Distillation for Face Recognition
Du et al. Robust image hashing based on multi-view dimension reduction
CN113239917B (en) Robust face recognition method based on singular value decomposition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant