CN106886768A

CN106886768A - A kind of video fingerprinting algorithms based on deep learning

Info

Publication number: CN106886768A
Application number: CN201710119749.1A
Authority: CN
Inventors: 杭欣; 郭伟伟
Original assignee: Hangzhou Arcvideo Technology Co ltd
Current assignee: Hangzhou Arcvideo Technology Co ltd
Priority date: 2017-03-02
Filing date: 2017-03-02
Publication date: 2017-06-23

Abstract

The present invention relates to a kind of video fingerprinting algorithms based on deep learning, including：Enhanced training image is input into training pattern；Using training pattern training until result restrains；Initialisation image Feature Selection Model；Inputted video image carries out GPU and extracts feature；Binary-coding is carried out to extracting feature.Feature extraction training is carried out to enhanced picture using training pattern for the present invention and adjusting parameter obtains optimum extraction effect, and then carry out video finger print high efficiency extraction using the parameter, the robustness of fingerprint characteristic algorithm is improved by the enhancing treatment to image, the generalized ability of image characteristics extraction is improved using the method in multilayer convolution and pond, the improved methods such as the precision extracted using loss function control and verification characteristics cause that the present invention possesses stronger recognition capability compared with conventional fingerprint algorithm, antijamming capability, faster extraction rate and precision higher, so as to effectively increase the efficiency of video fingerprinting algorithms.

Description

A kind of video fingerprinting algorithms based on deep learning

Technical field

The present invention relates to a kind of video fingerprinting algorithms based on deep learning, more particularly to a kind of digital video signal processing The video fingerprinting algorithms of technical field.

Background technology

Video plays key player as a kind of important media transfer mode of modern society in life, either from Quick expression authorial intention is still understood in the content angle to be expressed from becoming apparent from the understandable reader that allows, and video is all than text The broadcasting media such as word, sound mode will seem more outstanding, also more popular, people can within be concerned about by watching oneself Video desired information is obtained in very short time, this daily life brings convenience.

However, video proposes requirement with the retrieval that the otherness of other media modes is also video.How could be quick The information that oneself desired or correlation is searched from the video information of magnanimity turns into one of problem demanding prompt solution.Video refers to Line retrieval is a kind of common video frequency searching mode, including global video finger print is extracted and local video finger print extracts two kinds of sides Case.The overall situation is extracted possesses good robustness and accuracy, but extraction rate is slow, and the local mode speed extracted is relatively fast But can not well because to flaw present in image, such as translation, scaling, black surround etc..

The content of the invention

For a kind of not enough present in existing video image fingerprint extraction technology, video based on deep learning of the present invention Fingerprint algorithm, proposition is translated by picture, scaled, being sheared, blackened side, captioning, adds the conversion enhancing operation such as logo Afterwards, off-line training model is input into, image is processed by the way of multiple convolution, Chi Hua, full connection in training pattern, Data after treatment are judged using Hash loss function, repeatedly judges to draw model parameter after obtaining result convergence, entered And read using GPU and initialize Online Video image characteristics extraction model, online feature is carried out to the video image after sampling Extract, the feature that will finally extract carries out the finger image that binary-coding generates 128 bits.Using training pattern to enhanced Picture carries out feature extraction training and adjusting parameter obtains optimum extraction effect, and then it is efficient to carry out video finger print using the parameter Extract, carried by the robustness of the enhancing treatment raising fingerprint characteristic algorithm to image, using the method for multilayer convolution and pond The improved methods such as the generalized ability of hi-vision feature extraction, the precision extracted using loss function control and verification characteristics cause this Invention possesses stronger recognition capability, antijamming capability, faster extraction rate and higher compared with conventional fingerprint algorithm Precision, so as to effectively increase the efficiency of video fingerprinting algorithms.

The technical solution adopted for the present invention to solve the technical problems is comprised the following steps：

Enhancing image step, is labelled and is carried out enhancing treatment to different types of image, by the picture input after treatment Training pattern.

Preferably, the enhancing treatment refers to carries out various conversion to described image such as：Scale, translate, shearing, blacken side, Captioning, plus logo etc..

Off-line training step, carries out off-line training to the enhancing picture being input into and obtains training ginseng using off-line training model Number.

Preferably, the step includes：Image scaling treatment, the treatment of multiple convolution, pondization, full connection treatment twice and Loss function judges.

Preferably, described image scaling treatment refers to and unifies to be scaled 227*227 sizes by input picture.

Preferably, the multiple convolution, pondization treatment includes：Convolution adds pondization to process twice, further convolution twice Process, further a convolution adds pondization to process.

Preferably, the loss function is hash function, specific as follows：

Preferably, in the loss function, the b is the output of network, and y=0 represents similar, and y=1 represents dissimilar, m=256, a =0.01。

Preferably, the condition that the off-line training step terminates off-line training is the result that the loss function is calculated Convergence.

On-time model initialization step, the training parameter obtained using off-line training extracts mould to Online Video finger image Type is initialized.

Online Video image characteristics extraction step, it is online to read video image and utilize GPU extract real-time characteristics of image；

Preferably, the step includes：Will initialize image characteristics extraction model read in GPU caching, to video image according to Specific frame per second extraction, the sample image input model that will be extracted carry out feature extraction.

Generation finger image step, the characteristics of image to On-line testing carries out binary-coding, generates finger image.

Preferably, described image is characterized as 128 floating numbers；The binary-coding refers to：By the floating number with it is specific Numerical value is compared, and 1 is recorded as more than the floating number of the numerical value, is otherwise recorded as 0；Described image fingerprint is 128 bits Binary-coding data set.

Using above-mentioned technical proposal, the present invention has advantages below：

The present invention relates to a kind of video fingerprinting algorithms based on deep learning, proposition is translated by picture, scaled, being cut Cut, blacken side, captioning, plus the conversion enhancing operation such as logo after, off-line training model is input into, using multiple in training pattern Convolution, Chi Hua, the mode of full connection are processed image, and the data after treatment are judged using Hash loss function, Repeatedly judge to draw model parameter after obtaining result convergence, and then read using GPU and initialize Online Video characteristics of image and carry Modulus type, is carried out in line feature extraction to the video image after sampling, and the feature that will finally extract carries out binary-coding generation The finger image of 128 bits.Feature extraction training is carried out to enhanced picture using training pattern and adjusting parameter obtains optimal Extraction effect, and then video finger print high efficiency extraction is carried out using the parameter, processed by the enhancing to image and improve fingerprint characteristic The robustness of algorithm, using multilayer convolution and pond method improve image characteristics extraction generalized ability, using loss function The improved methods such as the precision that simultaneously verification characteristics are extracted are controlled to cause that the present invention possesses stronger image recognition compared with conventional fingerprint algorithm Ability, antijamming capability, faster extraction rate and precision higher, so as to effectively increase the effect of video fingerprinting algorithms Rate.

Brief description of the drawings

The step of Fig. 1 is a kind of video fingerprinting algorithms based on deep learning of better embodiment of the present invention schematic diagram.

Fig. 2 is a kind of detail flowchart of video fingerprinting algorithms based on deep learning of better embodiment of the present invention.

Specific embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Whole description, it is clear that described embodiment is only one embodiment of the present of invention, rather than whole embodiments.Based on this Embodiment in invention, other realities that those of ordinary skill in the art are obtained on the premise of creative work is not made Example is applied, the scope of protection of the invention is belonged to.

It is shown in Figure 1 the embodiment of the invention discloses a kind of video fingerprinting algorithms based on deep learning, the method bag Include following steps：

Step S1：Enhancing image step.

Preferably, different types of image is labelled and is carried out enhancing treatment, the picture after treatment is input into instruction Practice model.

Step S2：Off-line training step.

Preferably, is carried out by off-line training and training parameter is obtained for the enhancing picture being input into using off-line training model.

Step S3：On-time model initialization step.

Preferably, the training parameter for being obtained using off-line training is initialized to Online Video finger image extraction model.

Step S4：Online Video image characteristics extraction step.

Preferably, it is online to read video image and utilize GPU extract real-time characteristics of image.

Step S5：Generation finger image step.

Preferably, the characteristics of image to On-line testing carries out binary-coding, generates finger image.

In the embodiment of the present invention, picture translated by use, is scaled, being sheared, blackened side, captioning, add logo After Deng conversion enhancing operation, off-line training model is input into, in training pattern by the way of multiple convolution, Chi Hua, full connection Image is processed, the data that treatment has are judged using Hash loss function, after repeatedly judging to obtain result convergence Model parameter is drawn, and then is read using GPU and is initialized Online Video image characteristics extraction model, to the video figure after sampling As carrying out in line feature extraction, the feature that will finally extract carries out the finger image that binary-coding generates 128 bits.

It can be seen that, in the embodiment of the present invention, feature extraction is carried out to enhanced picture using training pattern and is trained and is adjusted ginseng Number obtains optimum extraction effect, and then carries out video finger print high efficiency extraction using the parameter, is processed by the enhancing to image and carried The robustness of fingerprint characteristic algorithm high, generalized ability, profit that image characteristics extraction is improved using the method in multilayer convolution and pond The improved methods such as the precision that simultaneously verification characteristics are extracted are controlled to cause that the present invention possesses stronger compared with conventional fingerprint algorithm with loss function Recognition capability, antijamming capability, faster extraction rate and precision higher, so as to effectively increase video finger print The efficiency of algorithm.

The embodiment of the invention discloses a kind of video fingerprinting algorithms based on deep learning, referring to Fig. 2, two realities are gone up relatively Example is applied, the present embodiment has made further instruction and optimization to technical scheme.Specifically, a kind of audio frequency and video are automatic in the present embodiment The method and apparatus of overlapping text are comprised the steps of：

Step S1：Enhancing image step, is labelled and is carried out enhancing treatment, by the figure after treatment to different types of image Piece is input into training pattern.

Preferably, by performing step S11：Training image strengthen, realization all pictures are zoomed in and out, are translated, sheared, Blacken side, captioning, add logo etc., by the result input step S12 after treatment：Labelled to different images.

Preferably, realize stamping different types according to different type to the training picture being input into by performing step S12 ID, the pictorial information input step S2 after treatment extracted in offline feature and is trained.

Step S2：Off-line training step, is carried out off-line training and obtained using off-line training model to the enhancing picture being input into Obtain training parameter.

Preferably, the data message to step S12 inputs performs step S21：Adjustment picture size, picture is adjusted It is 227 × 227 sizes, and convolution adds pond operation to process twice will to perform the picture input step S22 execution after zoom operations.

Preferably, realize processing the double convolution plus pondization of picture by performing step S22, by convolution operation Robustness is improved, is operated by pondization and is improved the generalized ability for processing, and by result input step S23：Convolution is grasped twice Make.

Preferably, realize further improving the double process of convolution for being input into image data by performing step S23 The robustness of picture processing, and by result input step S24：One time convolution adds pondization to operate.

Preferably, realize processing further raising figure to input convolution of picture plus pondization by performing step S24 The robustness and generalized ability of piece treatment, and by result input step S25：Full attended operation twice.

Preferably, realize operating the double full connection treatment of input picture by performing step S25, it is described to connect entirely Operation is connect for Global treatment behavior, specifically, each node of present treatment layer is carried out linking place with all nodes of last layer Reason, by result input step S26：Hash loss function judges.

Preferably, the result for being drawn after performing step S26 to the step S21 to step S25 treatment is lost Judge, and step S27 is performed for result of determination.

Preferably, realized by performing step S27："current" model configuration parameter set is extracted when result of determination restrains and is input into Step S3：Initialize online Feature Selection Model；The off-line training that step S11 triggers a new round is performed during result of determination non-convergent Process, until result restrains.

Step S3：On-time model initialization step, the training parameter obtained using off-line training is referred to Online Video image Line extraction model is initialized.

Preferably, online Feature Selection Model is with the difference of off-line training model, On-line testing model output result For the output data that double full connection has been processed, it is not necessary to carrying out result judgement using hash function.

Step S4：Online Video image characteristics extraction step, video image and extract real-time image are read using GPU online Feature.

Preferably, realize taking out the specific frame per second of inputted video image by performing step S41 after execution of step S3 Sample treatment, and perform step S42 using sampling results as input：Feature extraction is carried out using GPU.

Preferably, the image characteristics extraction model after performing the initialization that step S42 read steps S3 is exported, and profit Image characteristics extraction is carried out respectively to the picture that step S41 is input into the model realization.

Preferably, the characteristics of image that the step S42 is extracted is the characteristics of image description that 128 floating numbers are represented, by this Feature describes input step S5：Binary-coding is carried out to feature.

Step S5：Generation finger image step, the characteristics of image to On-line testing carries out binary-coding, and generation image refers to Line.

Preferably, the binary-coding of the iamge description by performing step S5 to realize and being produced to step S42, specifically, will 128 floating numbers are compared with specific threshold respectively, and 1 is recorded as more than the floating number of the numerical value, are otherwise recorded as 0, So as to obtain the finger image of 128 bits.

In sum, the training image being input into is zoomed in and out by performing step S11, translated, sheared, blacken side, added Captions, plus logo etc. treatment realize image enhaucament, and step S12 is performed to enhanced picture, to different types of picture Corresponding ID labels are stamped, the image to marked type label performs step S21 operations, picture is scaled into 227*227 big It is small, to being sized after picture perform step S22 operation, double convolution plus pondization treatment are carried out, by process of convolution Improve the robustness for the treatment of, the generalized ability for improving treatment is processed by pondization, step S23 is performed to the picture after treatment, enter The double process of convolution of row, further improves the robustness for the treatment of, and step S24 is performed to the picture after treatment, performs once Convolution adds pond processing procedure, further improves the robustness and generalized ability of picture processing, and the picture after treatment is performed Step S25, performs full connection treatment operation twice, realizes global treatment operation, specifically, by each node of this layer with it is upper All nodes of layer set up connection, and step S26 is performed to performing the data after S25 treatment, realize damaging the Hash of input data Lose function and calculate result of determination, and step S27 is performed to result, step S11 is performed when loss function result does not restrain, Triggering new round off-line training process, otherwise performs step S3, will process the mould that operation is obtained by step S21 to step S26 Type configuration parameter set Input Online image characteristics extraction model carries out initialization operation, and step is performed after the completion of initialization operation Rapid S41, realization carries out the subsampling operation of specific frame per second to inputted video image, sampling results is input and carried out into step S42, profit Carried out with the model data after the initialization obtained after GPU read steps S3 treatment, and the sampling picture that step S41 is obtained Line feature extraction, described to be characterized as 128 characteristics of image descriptions of floating number, the characteristics of image for extracting performs step S5 The binary-coding to characteristics of image is realized, specifically, 128 features description is compared with specific threshold, be will be greater than The number scale of the threshold value is 1, otherwise is designated as 0, so as to form the bianry image finger print data of 128 bits.By using training mould Feature extraction training is carried out to enhanced picture for type and adjusting parameter obtains optimum extraction effect, and then is regarded using the parameter Frequency fingerprinting high performance is extracted, by the robustness of the enhancing treatment raising fingerprint characteristic algorithm to image, using multilayer convolution and pond The method of change is improved the generalized ability of image characteristics extraction, is improved using loss function control precision that simultaneously verification characteristics are extracted etc. Method causes that the present invention possesses stronger recognition capability, antijamming capability, faster extraction rate compared with conventional fingerprint algorithm And precision higher, so as to effectively increase the efficiency of video fingerprinting algorithms.

The foregoing is only illustrative, rather than for restricted.Those skilled in the art can carry out various changing to invention Dynamic and modification is without departing from the spirit and scope of the present invention.So, if these modifications of the invention and modification belong to the present invention Within the scope of claim and its equivalent technologies, then the present invention is also intended to including including these changes and modification.

Claims

1. a kind of video fingerprinting algorithms based on deep learning, it is characterised in that the method includes the steps of：

Enhancing image step, is labelled and is carried out enhancing treatment to different types of image, by the picture input after treatment Training pattern；

Off-line training step, is carried out off-line training and obtains training parameter using off-line training model to the enhancing picture being input into；

On-time model initialization step, the training parameter obtained using off-line training is at the beginning of Online Video finger image extraction model Beginningization；

2. a kind of video fingerprinting algorithms based on deep learning as claimed in claim 1, it is characterised in that the enhancing image In step, the enhancing treatment refers to carries out various conversion to described image such as：Scale, translate, shearing, blacken side, captioning, add Logo etc..

3. a kind of video fingerprinting algorithms based on deep learning as claimed in claim 1, it is characterised in that the off-line training Step includes：Image scaling treatment, the twice treatment of multiple convolution, pondization, full connection treatment and loss function judgement.

4. a kind of video fingerprinting algorithms based on deep learning as claimed in claim 3, it is characterised in that described image is scaled Treatment refers to unifies to be scaled 227*227 sizes by input picture.

5. a kind of video fingerprinting algorithms based on deep learning as claimed in claim 3, it is characterised in that many secondary volumes Product, pondization treatment include：Convolution adds pondization to process twice, further process of convolution twice, and further a convolution adds Pondization treatment.

6. a kind of video fingerprinting algorithms based on deep learning as claimed in claim 3, it is characterised in that the loss function It is hash function, it is specific as follows.

7. a kind of video fingerprinting algorithms based on deep learning as claimed in claim 6, it is characterised in that the loss function In, the b is the output of network, and y=0 represents similar, and y=1 represents dissimilar, m=256, a=0.01.

8. a kind of video fingerprinting algorithms based on deep learning as described in claim 1,6, it is characterised in that the offline instruction The condition that white silk step terminates off-line training is the result convergence that the loss function is calculated.

9. a kind of video fingerprinting algorithms based on deep learning as claimed in claim 1, it is characterised in that the Online Video Image characteristics extraction step includes：Will initialize image characteristics extraction model read in GPU caching, to video image according to specific Frame per second extraction, the sample image input model that will be extracted carry out feature extraction.

10. a kind of video fingerprinting algorithms based on deep learning as claimed in claim 1, it is characterised in that the generation figure As in fingerprint step, described image is characterized as 128 floating numbers；The binary-coding refers to：By the floating number and specific number Value is compared, and 1 is recorded as more than the floating number of the numerical value, is otherwise recorded as 0；Described image fingerprint is 128 the two of bit Value coded data collection.