The content of the invention
For a kind of not enough present in existing video image fingerprint extraction technology, video based on deep learning of the present invention
Fingerprint algorithm, proposition is translated by picture, scaled, being sheared, blackened side, captioning, adds the conversion enhancing operation such as logo
Afterwards, off-line training model is input into, image is processed by the way of multiple convolution, Chi Hua, full connection in training pattern,
Data after treatment are judged using Hash loss function, repeatedly judges to draw model parameter after obtaining result convergence, entered
And read using GPU and initialize Online Video image characteristics extraction model, online feature is carried out to the video image after sampling
Extract, the feature that will finally extract carries out the finger image that binary-coding generates 128 bits.Using training pattern to enhanced
Picture carries out feature extraction training and adjusting parameter obtains optimum extraction effect, and then it is efficient to carry out video finger print using the parameter
Extract, carried by the robustness of the enhancing treatment raising fingerprint characteristic algorithm to image, using the method for multilayer convolution and pond
The improved methods such as the generalized ability of hi-vision feature extraction, the precision extracted using loss function control and verification characteristics cause this
Invention possesses stronger recognition capability, antijamming capability, faster extraction rate and higher compared with conventional fingerprint algorithm
Precision, so as to effectively increase the efficiency of video fingerprinting algorithms.
The technical solution adopted for the present invention to solve the technical problems is comprised the following steps:
Enhancing image step, is labelled and is carried out enhancing treatment to different types of image, by the picture input after treatment
Training pattern.
Preferably, the enhancing treatment refers to carries out various conversion to described image such as:Scale, translate, shearing, blacken side,
Captioning, plus logo etc..
Off-line training step, carries out off-line training to the enhancing picture being input into and obtains training ginseng using off-line training model
Number.
Preferably, the step includes:Image scaling treatment, the treatment of multiple convolution, pondization, full connection treatment twice and
Loss function judges.
Preferably, described image scaling treatment refers to and unifies to be scaled 227*227 sizes by input picture.
Preferably, the multiple convolution, pondization treatment includes:Convolution adds pondization to process twice, further convolution twice
Process, further a convolution adds pondization to process.
Preferably, the loss function is hash function, specific as follows:
Preferably, in the loss function, the b is the output of network, and y=0 represents similar, and y=1 represents dissimilar, m=256, a
=0.01。
Preferably, the condition that the off-line training step terminates off-line training is the result that the loss function is calculated
Convergence.
On-time model initialization step, the training parameter obtained using off-line training extracts mould to Online Video finger image
Type is initialized.
Online Video image characteristics extraction step, it is online to read video image and utilize GPU extract real-time characteristics of image;
Preferably, the step includes:Will initialize image characteristics extraction model read in GPU caching, to video image according to
Specific frame per second extraction, the sample image input model that will be extracted carry out feature extraction.
Generation finger image step, the characteristics of image to On-line testing carries out binary-coding, generates finger image.
Preferably, described image is characterized as 128 floating numbers;The binary-coding refers to:By the floating number with it is specific
Numerical value is compared, and 1 is recorded as more than the floating number of the numerical value, is otherwise recorded as 0;Described image fingerprint is 128 bits
Binary-coding data set.
Using above-mentioned technical proposal, the present invention has advantages below:
The present invention relates to a kind of video fingerprinting algorithms based on deep learning, proposition is translated by picture, scaled, being cut
Cut, blacken side, captioning, plus the conversion enhancing operation such as logo after, off-line training model is input into, using multiple in training pattern
Convolution, Chi Hua, the mode of full connection are processed image, and the data after treatment are judged using Hash loss function,
Repeatedly judge to draw model parameter after obtaining result convergence, and then read using GPU and initialize Online Video characteristics of image and carry
Modulus type, is carried out in line feature extraction to the video image after sampling, and the feature that will finally extract carries out binary-coding generation
The finger image of 128 bits.Feature extraction training is carried out to enhanced picture using training pattern and adjusting parameter obtains optimal
Extraction effect, and then video finger print high efficiency extraction is carried out using the parameter, processed by the enhancing to image and improve fingerprint characteristic
The robustness of algorithm, using multilayer convolution and pond method improve image characteristics extraction generalized ability, using loss function
The improved methods such as the precision that simultaneously verification characteristics are extracted are controlled to cause that the present invention possesses stronger image recognition compared with conventional fingerprint algorithm
Ability, antijamming capability, faster extraction rate and precision higher, so as to effectively increase the effect of video fingerprinting algorithms
Rate.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Whole description, it is clear that described embodiment is only one embodiment of the present of invention, rather than whole embodiments.Based on this
Embodiment in invention, other realities that those of ordinary skill in the art are obtained on the premise of creative work is not made
Example is applied, the scope of protection of the invention is belonged to.
It is shown in Figure 1 the embodiment of the invention discloses a kind of video fingerprinting algorithms based on deep learning, the method bag
Include following steps:
Step S1:Enhancing image step.
Preferably, different types of image is labelled and is carried out enhancing treatment, the picture after treatment is input into instruction
Practice model.
Step S2:Off-line training step.
Preferably, is carried out by off-line training and training parameter is obtained for the enhancing picture being input into using off-line training model.
Step S3:On-time model initialization step.
Preferably, the training parameter for being obtained using off-line training is initialized to Online Video finger image extraction model.
Step S4:Online Video image characteristics extraction step.
Preferably, it is online to read video image and utilize GPU extract real-time characteristics of image.
Step S5:Generation finger image step.
Preferably, the characteristics of image to On-line testing carries out binary-coding, generates finger image.
In the embodiment of the present invention, picture translated by use, is scaled, being sheared, blackened side, captioning, add logo
After Deng conversion enhancing operation, off-line training model is input into, in training pattern by the way of multiple convolution, Chi Hua, full connection
Image is processed, the data that treatment has are judged using Hash loss function, after repeatedly judging to obtain result convergence
Model parameter is drawn, and then is read using GPU and is initialized Online Video image characteristics extraction model, to the video figure after sampling
As carrying out in line feature extraction, the feature that will finally extract carries out the finger image that binary-coding generates 128 bits.
It can be seen that, in the embodiment of the present invention, feature extraction is carried out to enhanced picture using training pattern and is trained and is adjusted ginseng
Number obtains optimum extraction effect, and then carries out video finger print high efficiency extraction using the parameter, is processed by the enhancing to image and carried
The robustness of fingerprint characteristic algorithm high, generalized ability, profit that image characteristics extraction is improved using the method in multilayer convolution and pond
The improved methods such as the precision that simultaneously verification characteristics are extracted are controlled to cause that the present invention possesses stronger compared with conventional fingerprint algorithm with loss function
Recognition capability, antijamming capability, faster extraction rate and precision higher, so as to effectively increase video finger print
The efficiency of algorithm.
The embodiment of the invention discloses a kind of video fingerprinting algorithms based on deep learning, referring to Fig. 2, two realities are gone up relatively
Example is applied, the present embodiment has made further instruction and optimization to technical scheme.Specifically, a kind of audio frequency and video are automatic in the present embodiment
The method and apparatus of overlapping text are comprised the steps of:
Step S1:Enhancing image step, is labelled and is carried out enhancing treatment, by the figure after treatment to different types of image
Piece is input into training pattern.
Preferably, by performing step S11:Training image strengthen, realization all pictures are zoomed in and out, are translated, sheared,
Blacken side, captioning, add logo etc., by the result input step S12 after treatment:Labelled to different images.
Preferably, realize stamping different types according to different type to the training picture being input into by performing step S12
ID, the pictorial information input step S2 after treatment extracted in offline feature and is trained.
Step S2:Off-line training step, is carried out off-line training and obtained using off-line training model to the enhancing picture being input into
Obtain training parameter.
Preferably, the data message to step S12 inputs performs step S21:Adjustment picture size, picture is adjusted
It is 227 × 227 sizes, and convolution adds pond operation to process twice will to perform the picture input step S22 execution after zoom operations.
Preferably, realize processing the double convolution plus pondization of picture by performing step S22, by convolution operation
Robustness is improved, is operated by pondization and is improved the generalized ability for processing, and by result input step S23:Convolution is grasped twice
Make.
Preferably, realize further improving the double process of convolution for being input into image data by performing step S23
The robustness of picture processing, and by result input step S24:One time convolution adds pondization to operate.
Preferably, realize processing further raising figure to input convolution of picture plus pondization by performing step S24
The robustness and generalized ability of piece treatment, and by result input step S25:Full attended operation twice.
Preferably, realize operating the double full connection treatment of input picture by performing step S25, it is described to connect entirely
Operation is connect for Global treatment behavior, specifically, each node of present treatment layer is carried out linking place with all nodes of last layer
Reason, by result input step S26:Hash loss function judges.
Preferably, the result for being drawn after performing step S26 to the step S21 to step S25 treatment is lost
Judge, and step S27 is performed for result of determination.
Preferably, realized by performing step S27:"current" model configuration parameter set is extracted when result of determination restrains and is input into
Step S3:Initialize online Feature Selection Model;The off-line training that step S11 triggers a new round is performed during result of determination non-convergent
Process, until result restrains.
Step S3:On-time model initialization step, the training parameter obtained using off-line training is referred to Online Video image
Line extraction model is initialized.
Preferably, online Feature Selection Model is with the difference of off-line training model, On-line testing model output result
For the output data that double full connection has been processed, it is not necessary to carrying out result judgement using hash function.
Step S4:Online Video image characteristics extraction step, video image and extract real-time image are read using GPU online
Feature.
Preferably, realize taking out the specific frame per second of inputted video image by performing step S41 after execution of step S3
Sample treatment, and perform step S42 using sampling results as input:Feature extraction is carried out using GPU.
Preferably, the image characteristics extraction model after performing the initialization that step S42 read steps S3 is exported, and profit
Image characteristics extraction is carried out respectively to the picture that step S41 is input into the model realization.
Preferably, the characteristics of image that the step S42 is extracted is the characteristics of image description that 128 floating numbers are represented, by this
Feature describes input step S5:Binary-coding is carried out to feature.
Step S5:Generation finger image step, the characteristics of image to On-line testing carries out binary-coding, and generation image refers to
Line.
Preferably, the binary-coding of the iamge description by performing step S5 to realize and being produced to step S42, specifically, will
128 floating numbers are compared with specific threshold respectively, and 1 is recorded as more than the floating number of the numerical value, are otherwise recorded as 0,
So as to obtain the finger image of 128 bits.
In sum, the training image being input into is zoomed in and out by performing step S11, translated, sheared, blacken side, added
Captions, plus logo etc. treatment realize image enhaucament, and step S12 is performed to enhanced picture, to different types of picture
Corresponding ID labels are stamped, the image to marked type label performs step S21 operations, picture is scaled into 227*227 big
It is small, to being sized after picture perform step S22 operation, double convolution plus pondization treatment are carried out, by process of convolution
Improve the robustness for the treatment of, the generalized ability for improving treatment is processed by pondization, step S23 is performed to the picture after treatment, enter
The double process of convolution of row, further improves the robustness for the treatment of, and step S24 is performed to the picture after treatment, performs once
Convolution adds pond processing procedure, further improves the robustness and generalized ability of picture processing, and the picture after treatment is performed
Step S25, performs full connection treatment operation twice, realizes global treatment operation, specifically, by each node of this layer with it is upper
All nodes of layer set up connection, and step S26 is performed to performing the data after S25 treatment, realize damaging the Hash of input data
Lose function and calculate result of determination, and step S27 is performed to result, step S11 is performed when loss function result does not restrain,
Triggering new round off-line training process, otherwise performs step S3, will process the mould that operation is obtained by step S21 to step S26
Type configuration parameter set Input Online image characteristics extraction model carries out initialization operation, and step is performed after the completion of initialization operation
Rapid S41, realization carries out the subsampling operation of specific frame per second to inputted video image, sampling results is input and carried out into step S42, profit
Carried out with the model data after the initialization obtained after GPU read steps S3 treatment, and the sampling picture that step S41 is obtained
Line feature extraction, described to be characterized as 128 characteristics of image descriptions of floating number, the characteristics of image for extracting performs step S5
The binary-coding to characteristics of image is realized, specifically, 128 features description is compared with specific threshold, be will be greater than
The number scale of the threshold value is 1, otherwise is designated as 0, so as to form the bianry image finger print data of 128 bits.By using training mould
Feature extraction training is carried out to enhanced picture for type and adjusting parameter obtains optimum extraction effect, and then is regarded using the parameter
Frequency fingerprinting high performance is extracted, by the robustness of the enhancing treatment raising fingerprint characteristic algorithm to image, using multilayer convolution and pond
The method of change is improved the generalized ability of image characteristics extraction, is improved using loss function control precision that simultaneously verification characteristics are extracted etc.
Method causes that the present invention possesses stronger recognition capability, antijamming capability, faster extraction rate compared with conventional fingerprint algorithm
And precision higher, so as to effectively increase the efficiency of video fingerprinting algorithms.
The foregoing is only illustrative, rather than for restricted.Those skilled in the art can carry out various changing to invention
Dynamic and modification is without departing from the spirit and scope of the present invention.So, if these modifications of the invention and modification belong to the present invention
Within the scope of claim and its equivalent technologies, then the present invention is also intended to including including these changes and modification.