WO2020114378A1 - 视频水印的识别方法、装置、设备及存储介质 - Google Patents

视频水印的识别方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2020114378A1
WO2020114378A1 PCT/CN2019/122609 CN2019122609W WO2020114378A1 WO 2020114378 A1 WO2020114378 A1 WO 2020114378A1 CN 2019122609 W CN2019122609 W CN 2019122609W WO 2020114378 A1 WO2020114378 A1 WO 2020114378A1
Authority
WO
WIPO (PCT)
Prior art keywords
watermark
training
video
video frame
classification
Prior art date
Application number
PCT/CN2019/122609
Other languages
English (en)
French (fr)
Inventor
邹昱
杨轩
刘振强
潘跃
李振
Original Assignee
广州市百果园信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州市百果园信息技术有限公司 filed Critical 广州市百果园信息技术有限公司
Priority to US17/299,726 priority Critical patent/US11631248B2/en
Publication of WO2020114378A1 publication Critical patent/WO2020114378A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/835Generation of protective data, e.g. certificates
    • H04N21/8358Generation of protective data, e.g. certificates involving watermark
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • Embodiments of the present application relate to identification technology, for example, to a method, device, device, and storage medium for identifying a video watermark.
  • Watermark is an important symbol to protect copyright. With the gradual improvement of users' copyright awareness, various watermarks have also been widely used. Exemplarily, embedding the watermark in the video, because the video can be understood as consisting of at least two video frames, each video frame can be regarded as a picture, so embedding the watermark in the video can be understood as embedding the watermark in multiple sheets In the picture.
  • watermark recognition Due to the widespread use of watermarks, watermark recognition has also become a research direction. However, because watermarks usually occupy a small proportion of pictures, and often appear in non-critical areas of the picture, such as the bottom of the picture (such as the lower left corner or the lower right corner) or the top (such as the upper left corner or the upper right corner), etc., it brings recognition to the video watermark. Due to greater difficulties, the recognition accuracy of video watermarks is not high.
  • Embodiments of the present application provide a method, device, equipment, and storage medium for identifying a video watermark, so as to improve the accuracy of identifying a video watermark.
  • An embodiment of the present application provides a method for identifying a video watermark.
  • the method includes:
  • Input a plurality of image sequences corresponding to the plurality of video frames to the target detection model to obtain a classification result of each image block, and obtain a video feature vector according to the classification results of all image blocks;
  • the video feature vector is input to a watermark recognition model to obtain a watermark recognition probability output by the watermark recognition model.
  • a watermark recognition probability is greater than or equal to a probability threshold, it is determined that the video includes a watermark.
  • An embodiment of the present application also provides a video watermark recognition device, which includes:
  • the image sequence acquisition module is set to divide each video frame of the multiple video frames of the video into multiple image blocks to obtain the image sequence corresponding to each video frame;
  • the video feature vector acquisition module is configured to input multiple image sequences corresponding to the multiple video frames to the target detection model to obtain a classification result of each image block, and obtain a video feature vector according to the classification results of all image blocks;
  • the watermark recognition result determination module is configured to input the video feature vector to the watermark recognition model to obtain the watermark recognition probability output by the watermark recognition model, and when the watermark recognition probability is greater than or equal to a probability threshold, determine the The video contains a watermark.
  • An embodiment of the present application also provides a device, which includes:
  • One or more processors are One or more processors;
  • Memory set to store one or more programs
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the method provided in the embodiments of the present application.
  • an embodiment of the present application further provides a computer-readable storage medium that stores a computer program, and when the program is executed by a processor, the method provided by the embodiment of the present application is implemented.
  • FIG. 1 is a schematic diagram of a picture including a watermark provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of another picture including a watermark provided by an embodiment of the present application.
  • FIG. 3 is a flowchart of a video watermark recognition method provided by an embodiment of the present application.
  • FIG. 4 is a flowchart of another video watermark recognition method provided by an embodiment of the present application.
  • FIG. 5 is an application schematic diagram of a video watermark recognition method provided by an embodiment of the present application.
  • FIG. 6 is a flowchart of another video watermark recognition method provided by an embodiment of the present application.
  • FIG. 7 is an application schematic diagram of another video watermark recognition method provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a video watermark recognition device provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a device provided by an embodiment of the present application.
  • a picture containing a watermark is given.
  • the watermark is located in the upper right corner of the picture, which brings great difficulty to watermark recognition and makes the accuracy of watermark recognition not high.
  • the video can be understood as being composed of at least two video frames, each video frame can be regarded as a picture, therefore, the picture described here can be a static picture, a dynamic picture, or a video Video frame.
  • the so-called video watermark recognition refers to determining whether a video contains a watermark.
  • the recognition result of the video watermark is that the video contains a watermark and the video does not contain a watermark.
  • the watermark described here may be the same watermark, or may be different watermarks, such as a video is composed of three videos Frames, split the three video frames in chronological order, the watermark in the first video frame is located in the upper right corner of the video frame, the watermark in the second video frame is located on the upper left of the video frame, the third video The watermark in the frame is located at the upper right of the video frame.
  • the unfixed watermark position also increases the difficulty of identifying the video watermark.
  • it may be considered to increase the proportion of the watermark in the video frame, and separately perform watermark recognition on each video frame, on this basis, determine the video according to the watermark recognition results of at least two video frames Watermark recognition result.
  • FIG. 3 is a flowchart of a method for recognizing a video watermark provided by an embodiment of the present application. This embodiment can be applied to improve the recognition accuracy of a video watermark.
  • the method can be performed by a device for recognizing a video watermark.
  • the device can be configured in a device, such as a computer or mobile terminal. As shown in Figure 3, the method includes the following steps:
  • Step 110 Divide each video frame of the multiple video frames of the video into multiple image blocks to obtain an image sequence corresponding to each video frame.
  • Step 120 Input a plurality of image sequences corresponding to the plurality of video frames to the target detection model to obtain a classification result of each image block, and obtain a video feature vector according to the classification results of all image blocks.
  • video refers to capturing, recording, processing, storing, transmitting, and reproducing a series of static pictures in the form of electrical signals.
  • continuous static picture changes exceed 24 frames per second, According to the principle of persistence of vision, the human eye cannot distinguish a single static picture, and it seems to be a smooth and continuous visual effect.
  • a continuous static picture is called a video.
  • the static image is called a video frame.
  • watermarks usually occupy a relatively small proportion in video frames.
  • the location of the watermark in the video may not be fixed, which increases the difficulty of identifying the video watermark. Therefore, in order to improve the recognition accuracy of the video watermark, consider increasing the proportion of video frames occupied by the watermark, and separately for each video Frame watermark recognition.
  • each video frame of multiple video frames of the video into multiple image blocks to increase the proportion of the watermark in the video frame.
  • the multiple image blocks of each video frame form the corresponding Image sequence.
  • the multiple image blocks of each video frame may be equal-height image blocks.
  • V ⁇ I 1 , I 2 , ..., I n , ..., I N-1 , I N ⁇
  • I n is the sequence of a plurality of image blocks of the n-th frame of the video image of the formed frame.
  • the size of the video frame is 256 ⁇ 128, the watermark is located in the upper right corner of the video frame, and the size of the watermark is 12 ⁇ 6.
  • the video frame is divided into 8 image blocks, the size of each image block is 64 ⁇ 64, and the 8 image blocks in FIG. 2 are referred to as the first from left to right and from top to bottom.
  • the image block, the second image block, ..., the seventh image block and the eighth image block because the watermark is located in the upper right corner of the video frame, the watermark appears in the second image block, except that the second image block contains the watermark
  • other image blocks do not contain watermarks.
  • the proportion of the watermark in the video frame is After dividing the video frame into 8 graphic blocks, the proportion of the watermark in the second image block of the video frame is It can be seen that by dividing each video frame in the video into multiple image blocks, the proportion of the watermark in the video frame can be increased.
  • the pre-trained target detection model may be generated by training samples based on the classifier model training, and the training samples may include training pictures, classification categories of the training pictures, and position information of the training pictures.
  • Commonly used classifier models include Bayesian decision, maximum likelihood classifier, Bayesian classifier, cluster analysis model, neural network model, support vector machine model, chaos and fractal model, and hidden Markov model Model etc. The classifier model can be set according to the actual situation and is not limited here.
  • the classification result may include the classification category of the image block, the classification probability of the image block, and the position information of the image block.
  • the classification category may include a watermark and a background. If the classification category is a watermark, it may indicate that the image block contains a watermark; if the classification category is a background, it may indicate that the image block does not contain a watermark.
  • Each image block may include multiple classification results, and the number of classification results may be set according to actual conditions, which is not limited herein.
  • the classification probability, x min , y min , x max , and y max represent the position information of the image block I nm in the t-th classification result of the image block I nm .
  • (x min , y min ) represents position information of the upper left corner of the image block Inm ; (x max , y max ) represents position information of the lower right corner of the image block Inm .
  • Obtaining the video feature vector according to the classification results of all image blocks may include: taking the classification result of the watermark as the candidate classification result of the image block among the multiple classification results of each image block. According to the candidate classification results of all image blocks, multiple feature vectors corresponding to the multi-frame video frames are obtained. The video feature vector is obtained according to the multiple feature vectors corresponding to the multi-frame video frames respectively. In this embodiment, obtaining multiple feature vectors corresponding to the multi-frame video frames respectively according to the candidate classification results of all image blocks may include the following two ways:
  • Method 1 For each video frame, according to the classification probabilities in the alternative classification results of multiple image blocks in the frame video, sort the alternative classification results of multiple image blocks in descending order, and select the top U backup Select the result to get the feature vector of the video frame. If the classification probabilities in the candidate results are the same, one of the candidate results can be randomly selected. In addition, if the number of candidate classification results in an image block is less than U, it may be supplemented with a preset identifier, which may be -1. The purpose of performing the above operation is to maintain the same dimension of the feature vectors of multi-frame video frames.
  • Method 2 For each video frame, according to the classification probability in the candidate classification result of each image block in the frame video, sort the candidate classification results of each image block in descending order, and select each image block The top V candidate classification results in the sorting results of the candidate classification results are used as the target classification result of the image block. According to the classification probabilities in the target classification results of multiple image blocks, the target classification results of multiple image blocks are sorted in descending order, and the first U target classification results are selected to obtain the feature vector of the video frame, 1 ⁇ V ⁇ U. Similarly, if there is a case where the classification probabilities in the candidate classification results and/or target classification results are the same, one of the candidate results and/or target classification results may be randomly selected.
  • the number of candidate classification results in an image block is less than V, it can be supplemented with a preset identification; and/or if the number of target classification results in an image block is less than V, the identification is also used.
  • the preset identifier mentioned here may be -1. The purpose of performing the above operation is to maintain the same dimension of the feature vectors of multi-frame video frames.
  • the first method is to directly select all the candidate results of multiple image blocks in descending order according to the classification probability, and select the first U candidate results to constitute the The feature vector of the video frame, and in the second way, the candidate classification results of each image block are first filtered to obtain the target classification result of each image block, and then according to the classification probability in the target classification results of multiple image blocks, the The target classification results of multiple image blocks are sorted in descending order, and the first U target classification results are selected to form the feature vector of the video frame, that is, the first method is to determine the feature vector of the video frame through one screening, and the second method is to pass Screen twice to determine the feature vector of the video frame.
  • the first screening in method 2 is not to sort the candidate classification results of all image blocks, but to sort the candidate classification results of each image block, therefore, compared to the method of selecting a pair of all image blocks Sorting the classification results reduces the difficulty of data processing.
  • the second screening in the second method is to sort the target classification results of all image blocks, since the first screening reduces the amount of data processing, compared with the first method, it still reduces the difficulty of data processing .
  • the second method may be used to reduce the difficulty of data processing; when the video has a small number of frames, the first or second method may be used.
  • the candidate classification results of multiple image blocks are sorted in descending order according to the classification probability, and the first U candidate results are selected to form the feature vector of the video frame.
  • the candidate classification results of I n2 are: 0;
  • the candidate classification results of multiple image blocks are used as the target classification results of the image block, that is, the candidate classification results of each image block are first filtered to obtain the target classification of each image block result.
  • the target classification results of multiple image blocks are sorted in descending order, and the first U target classification results are selected to form the feature vector of the video frame.
  • the candidate classification results of I n2 are: 0;
  • All the target classification results of I n1 , I n2 , I n3 and I n4 are sorted in descending order according to the classification probability, and the sorting results are b n11 , b n12 , b n42 , b n43 , b n33 and b n31 , and
  • a video feature vector may be obtained according to multiple feature vectors corresponding to multiple video frames respectively.
  • the video feature vector is a vector set composed of multiple feature vectors corresponding to multiple video frames, respectively.
  • V ⁇ I 1 , I 2 , ..., I n , ..., I N-1 , I N ⁇ , I n denotes the n-th video frame, n ⁇ ⁇ 1,2, ising, N -1, N ⁇ , I n can be determined by the previously described embodiment.
  • Step 130 Input the video feature vector to the watermark recognition model to obtain the watermark recognition probability output by the watermark recognition model.
  • Step 140 Determine whether the watermark recognition probability is greater than or equal to the probability threshold; if the watermark recognition probability is greater than or equal to the probability threshold, perform step 150; if the watermark recognition probability is less than the probability threshold, perform step 160.
  • Step 150 Determine that the video contains a watermark.
  • Step 160 Determine that the video does not contain a watermark.
  • the video feature vector is input into the pre-trained watermark recognition model, and the watermark recognition probability of the video is obtained through calculation of the watermark recognition model.
  • the pre-trained watermark recognition model may be generated by training samples based on extreme gradient lifting (eXtreme Gradient Boosting, Xgboost) model training, and the training samples may be training video feature vectors and training video classification categories.
  • the probability threshold can be used as a basis for determining whether the video contains a watermark, and its value can be set according to the actual situation, which is not limited herein. Exemplarily, the probability threshold is 0.9.
  • obtaining the video feature vector according to the classification results of all image blocks includes: determining the watermark classification result of each image block according to multiple classification results of each image block of each video frame; The watermark classification result of all image blocks of each video frame obtains the feature vector corresponding to each video frame.
  • a video feature vector is obtained from multiple feature vectors corresponding to multiple video frames respectively.
  • the watermark classification result of each image block is determined according to the classification result of each image block of each video frame, which can be understood as follows:
  • the classification result of the image block may include the classification category of the image block , The classification probability of the image block and the location information of the image block.
  • the classification category of the image block includes watermark and background.
  • the classification result of the watermark can be called the watermark classification result, and the classification result of the background can be called the background.
  • Classification results For each image block, since the classification result may be a watermark classification result or a background classification result, the purpose is to determine whether the video contains a watermark, so the watermark classification result can be regarded as a valid classification result.
  • the feature vector corresponding to each video frame is obtained, that is, the classification result of each image block is the watermark classification result as the candidate classification result of the image block, according to the The candidate classification results of all image blocks obtain the feature vector corresponding to each video frame.
  • obtaining the feature vector corresponding to each video frame may include the following two ways:
  • the watermark classification results of the multiple image blocks are sorted according to the probability that the multiple image blocks of each video frame contain a watermark.
  • the feature vector of each video frame is determined from the sorting result corresponding to each video frame. That is, for each video frame, the classification result of each image block is the watermark classification result as the candidate classification result of each image block.
  • the candidate classification results of the multiple image blocks are sorted in descending order, and the first U candidate results are selected to obtain the feature vector of the video frame.
  • the watermark classification results of each image block are sorted according to the probability that each image block of each video frame contains a watermark. Select some watermark classification results from multiple sorting results corresponding to each video frame for sorting.
  • the feature vector of each video frame is determined from the sorted partial watermark classification results corresponding to each video frame. That is, for each video frame, the classification result of each image block is a watermark classification result as the candidate classification result of each image block.
  • the candidate classification results of each image block are sorted in descending order, and the top V candidate classification results are selected as the target classification results of the image block.
  • the target classification results of multiple image blocks are sorted in descending order, and the first U target classification results are selected to obtain the feature vector of the video frame, 1 ⁇ V ⁇ U .
  • the feature vector corresponding to each video frame is obtained, which may include: according to the probability of multiple image blocks of each video frame containing the watermark , Sort the watermark classification results of the plurality of image blocks.
  • the feature vector of each video frame is determined from the sorting result corresponding to each video frame.
  • the watermark classification results of the plurality of image blocks are sorted according to the probability that the plurality of image blocks of each video frame contain a watermark, and the results are determined from the sorting results corresponding to each video frame
  • the feature vector of each video frame can be understood as follows: the classification probability included in the watermark classification result of each image block is the probability of containing the watermark, according to the probability of containing the watermark of multiple image blocks of each video frame , Sort the watermark classification results of the plurality of image blocks in descending order, and select the first U sorting results from the sorting results corresponding to each video frame, and determine the value of each video frame according to the first U sorting results Feature vector.
  • I n1 ⁇ b n11, b n12, b n13, b n14, b n15 ⁇
  • the watermark classification result of I n2 is 0;
  • watermark classification results of I n3 B n3 ' ⁇ b n31 , b n33 , b n34 ⁇ , the probability of b n31 , b n33 and b n34 containing watermarks are 0.3, 0.4 and 0.2 respectively;
  • All the watermark classification results of I n1 , I n2 , I n3 and I n4 are sorted in descending order according to the probability of containing the watermark.
  • the feature vector corresponding to each video frame is obtained, which may include: according to the probability that each image block of each video frame contains a watermark To sort the watermark classification results of each image block. Select some watermark classification results from multiple sorting results corresponding to each video frame for sorting. The feature vector of each video frame is determined from the sorted partial watermark classification results corresponding to each video frame.
  • the watermark classification results of each image block are sorted, from multiple sorting results corresponding to each video frame Select the partial watermark classification results for sorting, and determine the feature vector of each video frame from the sorted partial watermark classification results corresponding to each video frame, which can be understood as follows:
  • the watermark classification results of each image block include The classification probability of is the probability of including a watermark.
  • the watermark classification results of each image block are sorted in descending order, and selected from multiple sorting results
  • the first V watermark classification results are used as the target watermark classification results, and then the target watermark classification results are sorted in descending order.
  • the first U sorting results can be selected from the target watermark classification results, and the characteristics of the video frame can be determined according to the first U sorting results vector.
  • I n1 ⁇ b n11, b n12, b n13, b n14, b n15 ⁇
  • the watermark classification result of I n2 is 0;
  • the target detection model can be trained by acquiring the first training sample, the first training sample includes multiple training pictures, the classification categories of the multiple training pictures, and the location information. Divide each training picture in the plurality of training pictures into multiple first training image blocks, and obtain the classification category of each first training image block of each training picture according to the classification category of each training picture And obtaining the position information of each first training image block of each training picture according to the position information of each training picture. Taking all the first training image blocks as input variables, the classification category and position information of all the first training image blocks as output variables, training the classifier model, and obtaining the target detection model.
  • the target detection model may be trained in the following manner: obtaining a first training sample, the first training sample may include multiple training pictures, classification categories of the multiple training pictures, and the multiple training pictures Position information, divide each training picture in the multiple training pictures into multiple first training image blocks, and obtain each first training image of each training picture according to the classification category of each training picture The classification category of the block and the position information of each first training image block of each training picture are obtained according to the position information of each training picture.
  • the training picture may include a picture containing a watermark, and may also include a picture not containing a watermark (ie, a background picture), and the background picture described herein may be a picture containing subtitles.
  • the classification categories of the training pictures may include watermark and background.
  • the classification category of each first training image block of each training picture is obtained according to the classification category of each training picture, and each first training of each training picture is obtained according to position information of each training picture
  • the location information of the image blocks can be understood as follows: according to the classification category of each training picture, the classification category of each first training image block of each training picture can be marked, and according to the location information of each training picture, Mark the location information of each first training image block of each training picture.
  • the classification category of a training picture is a watermark
  • the location information of the training picture after the training picture is divided into a plurality of first training image blocks, the first training image block where the watermark appears
  • the classification category of is marked as a watermark
  • the classification category of the first training image block where the watermark does not appear is marked as the background
  • the position information of each first training image block is marked according to the position information of the training picture.
  • the classifier model may include a support vector machine model or a neural network model.
  • the neural network model is based on the basic principles of neural networks in biology. After understanding and abstracting the structure of the human brain and the external stimulus response mechanism, the network topology knowledge is used as the theoretical basis to simulate the nervous system of the human brain.
  • a mathematical model of information processing mechanism The model relies on the complexity of the system and adjusts the weights of the interconnection between a large number of internal nodes (neurons) to process information.
  • the neural network model can include a convolutional neural network model, a recurrent neural network model, and a deep neural network model.
  • the following uses the convolutional neural network model as an example.
  • the core problem solved by the convolutional neural network model is how to automatically extract and abstract features.
  • a convolutional neural network generally consists of the following three parts. The first part is the input layer, and the second part is a combination of the convolution layer, the activation layer, and the pooling layer (or downsampling layer). As a result, the third part consists of a fully connected multi-layer perceptron classifier (ie, fully connected layer).
  • the convolutional neural network model has the characteristics of weight sharing.
  • Weight sharing refers to the convolution kernel, which can extract the same features of different positions of the image data through the operation of a convolution kernel. In other words, it is in one image data The characteristics of the same target in different locations are basically the same. Only one part of the features can be obtained by using one convolution kernel. You can set up multi-core convolution and use each convolution kernel to learn different features to extract the features of the picture. In picture classification, the role of the convolutional layer is to extract and analyze low-level features into high-level features. Low-level features are basic features, such as textures and edges, and high-level features such as the shape of faces and objects. It can better represent the attributes of the sample. This process is the hierarchical nature of the convolutional neural network model.
  • the fully connected layer acts as a "classifier" in the entire convolutional neural network. If the operations of convolutional layer, excitation layer and pooling layer are to map the original data to the hidden layer feature space, the fully connected layer plays the role of mapping the learned "distributed feature representation" to the sample label space.
  • the fully connected layer can be realized by the convolution operation: the fully connected layer that is fully connected to the front layer can be converted into a convolution with a convolution kernel of 1 ⁇ 1; and the fully connected layer that is the convolution layer can be The convolution kernel is transformed into a global convolution of H ⁇ W, where H and W are the height and width of the previous layer convolution result, respectively.
  • the training process of the convolutional neural network model is to calculate the loss function of the convolutional neural network model through forward propagation, and calculate the partial derivative of the loss function on the network parameters.
  • the reverse gradient propagation method is used to calculate the network parameters of the convolutional neural network model. Make adjustments until the loss function of the convolutional neural network model reaches the preset function value. When the loss function value of the convolutional neural network model reaches the preset function value, it means that the convolutional neural network model has been trained. At this time, the network parameters of the convolutional neural network model are also determined, and the trained convolution can be The neural network model is used as the target detection model.
  • the loss function is a function that maps an event or value of one or more variables to a real number that can intuitively represent a "cost" associated with it, that is, the loss function maps events of one or more variables To a real number associated with a cost.
  • the loss function can be used to measure the model performance and the inconsistency between the actual value and the predicted value.
  • the model performance increases as the value of the loss function decreases.
  • the predicted value here refers to the classification category of each first training image block and each first training obtained by inputting all first training image blocks as input variables to the convolutional neural network model
  • the position information of the image blocks, the actual value refers to the actual classification category of each first training image block and the actual position information of each first training image block.
  • the loss function may be a cross-entropy loss function, a 0-1 loss function, a square loss function, an absolute loss function, a log loss function, etc., which can be set according to actual conditions, and is not limited herein.
  • the false detection rate of the target detection model is reduced, Furthermore, the prediction performance of the target detection model is improved.
  • all first training image blocks are used as input variables, and the classification category and position information of all first training image blocks are used as output variables.
  • the classifier model is trained to obtain the target detection model, which may include: Obtain the size information of each first training image block. Perform cluster analysis on the size information of all the first training image blocks to determine the a priori frame of each first training image block. Using all the first training image blocks and the a priori frames of all the first training image blocks as input variables, and the classification category and position information of all the first training image blocks as output variables, training the classifier model to obtain Target detection model.
  • the size information of each first training image block is obtained, and all The size information of the first training image block is subjected to cluster analysis to determine the a priori frame of each first training image block, that is, the corresponding a priori frame is selected for each first training image block through cluster analysis.
  • cluster analysis includes two basic contents, namely the measurement of pattern similarity and clustering algorithm.
  • participating in the process of training the classifier model can improve the prediction accuracy of the classifier model to predict the position information of each first training image block, thereby improving the target Check the predictive performance of the model.
  • the watermark recognition model can be trained in the following manner: obtaining a second training sample, the second training sample including the training video and the classification category of the training video.
  • Each training video frame in the plurality of training video frames of the training video is divided into a plurality of second training image blocks to obtain an image sequence corresponding to each training video frame.
  • Multiple image sequences respectively corresponding to multiple training video frames are input to the target detection model to obtain a classification result of each second training image block, and a training video feature vector is obtained according to the classification results of all second training image blocks.
  • the training video feature vector is used as the input variable
  • the classification category of the training video is used as the output variable
  • the XGBoost model is trained to obtain the watermark recognition model.
  • the XGBoost model is an improved version of the GBDT (Gradient Boosting Decision Tree) model.
  • the basic idea of the XGBoost model is to continuously reduce the residuals so that the residuals of the previous model are in the gradient direction It is further reduced and multiple basic learners are synthesized to obtain a strong learner.
  • the objective function of the XGBoost model is a second-order Taylor expansion. Compared with the first-order Taylor expansion, it has a wider learning range and stronger generalization ability, which makes the model more stable and increases the objective function. Regularization terms, thresholds, and coefficients.
  • the XGBoost model can effectively avoid the occurrence of overfitting.
  • the objective function is optimized through L1 or L2 regularization, and the learning rate is quickly converged to a value within the gradient range, so that the XGBoost model can find the optimal value; the threshold is added for pruning To limit the generation of trees; adding coefficients smoothes the value of leaf nodes to prevent overfitting.
  • the XGBoost model as a new type of improved decision tree model, has the advantages of high accuracy, fast operation speed, good scalability, and the importance of extractable features. It can improve the accuracy of watermark recognition under a certain speed.
  • the classification category of the training video is 1 or 0, where 1 represents the watermark and 0 represents the background.
  • the XGBoost model can be obtained by training Watermark recognition model.
  • the watermark recognition probability corresponding to the video feature vector can be obtained.
  • the range of the watermark recognition probability is [0, 1]. The closer the watermark recognition probability is to 1, it means The higher the probability that there is a watermark in the video corresponding to the input video feature vector.
  • the second training sample includes the training video and the classification category of the training video, and divide each training video frame in the plurality of training video frames of the training video into a plurality of second training image blocks to obtain the Image sequences corresponding to each training video frame, and input multiple image sequences corresponding to multiple training video frames to the target detection model to obtain the classification result of each second training image block, and according to the
  • the classification result obtains the training video feature vector, and then uses the training video feature vector as the input variable, and the training video classification category as the output variable, trains the XGBoost model, and obtains the watermark recognition model.
  • obtaining the training video feature vector according to the classification results of all second training image blocks may include: determining each of the second training image blocks according to multiple classification results of each second training image block of each training video frame A watermark classification result of a second training image block. According to the watermark classification results of all the second training image blocks of each second training image block, the feature vector corresponding to each training video frame is obtained. A training video feature vector is obtained according to multiple feature vectors corresponding to multiple training video frames respectively.
  • the watermark classification result of each second training image block is determined according to multiple classification results of each second training image block of each training video frame, which can be understood as follows: second training
  • the classification result of the image block may include the classification category of the second training image block, the classification probability of the second training image block, and the position information of the second training image block.
  • the classification category of the second training image block includes the watermark and the background.
  • the classification result of a watermark is called a watermark classification result
  • the classification result with a classification category of background can be called a background classification result.
  • the purpose is to determine whether the video contains a watermark, so the watermark classification result can be considered as a valid classification result.
  • the feature vector corresponding to each training video frame is obtained, that is, the classification result of each second training image block is the watermark classification result as
  • the feature vector corresponding to each training video frame is obtained according to the candidate classification results of all the second training image blocks of each training video frame.
  • obtaining the feature vector corresponding to each training video frame may include the following two ways:
  • the watermark classification results of the plurality of second training image blocks are sorted according to the probability that the plurality of second training image blocks of each training video frame contain watermarks.
  • the feature vector of each training video frame is determined from the sorting result corresponding to each training video frame. That is, for each training video frame, the classification result of each second training image block is a watermark classification result as an alternative classification result of each second training image block.
  • the classification probabilities in the candidate classification results of multiple second training image blocks sort the candidate classification results of multiple second training image blocks in descending order, and select the first U candidate results to obtain the training video frame Feature vector.
  • Manner 2 According to the probability that each second training image block of each training video frame contains a watermark, sort the watermark classification results of each second training image block. Select a part of the watermark classification results from multiple sorting results of each training video frame for sorting.
  • the feature vector of each training video frame is determined from the sorted partial watermark classification results corresponding to each video frame. That is, for each training video frame, the classification result of each second training image block is the watermark classification result as the candidate classification result of each second training image block.
  • the candidate classification results of each second training image block are sorted in descending order, and the first V candidate classification results are selected as the second The target classification result of the training image block.
  • the target classification results of multiple second training image blocks are sorted in descending order, and the first U target classification results are selected to obtain the characteristics of the training video frame Vector, 1 ⁇ V ⁇ U.
  • obtaining the feature vector corresponding to each training video frame may include: according to the number of each training video frame The probability of a second training image block containing a watermark sorts the watermark classification results of the plurality of second training image blocks. The feature vector of each training video frame is determined from the sorting result corresponding to each training video frame.
  • the watermark classification results of the plurality of second training image blocks are sorted according to the probability that the plurality of second training image blocks of each training video frame contain a watermark, and from each training video
  • the feature vector of each training video frame determined in the sorting result corresponding to the frame can be understood as follows: the classification probability included in the watermark classification result of each second training image block is the probability of including the watermark, according to each The probability of multiple second training image blocks of a training video frame containing a watermark, the watermark classification results of the multiple second training image blocks are sorted in descending order, and can be selected from the sorting results corresponding to each training video frame For the first U ranking results, the feature vector of each training video frame is determined according to the first U ranking results.
  • the watermark classification result of I n2 is 0;
  • watermark classification results of I n3 B n3 ' ⁇ b n31 , b n33 , b n34 ⁇ , the probability of b n31 , b n33 and b n34 containing watermarks are 0.3, 0.4 and 0.2 respectively;
  • All the watermark classification results of I n1 , I n2 , I n3 and I n4 are sorted in descending order according to the probability of containing the watermark.
  • obtaining the feature vector corresponding to each training video frame according to the watermark classification results of all the second training image blocks of each training video frame may include: according to each training video frame The probability of each second training image block containing a watermark sorts the watermark classification results of each second training image block. Select some watermark classification results from multiple sorting results corresponding to each training video frame for sorting. The feature vector of each training video frame is determined from the sorted partial watermark classification results corresponding to each video frame.
  • the watermark classification results of each second training image block are sorted, and from each training video Select a part of the watermark classification results from the multiple sorting results corresponding to the frames for sorting, and determine the feature vector of the training video frame corresponding to each video frame from the sorted partial watermark classification results corresponding to each video frame, which can be made as follows Understand:
  • the classification probability included in the watermark classification result of each second training image block is the probability of containing the watermark. According to the probability of containing the watermark of each second training image block of each training video frame, for each The watermark classification results of the second training image block are sorted in descending order.
  • the first V watermark classification results are selected from the multiple sorting results corresponding to each training video frame as the target watermark classification results, and then the target watermark classification results are sorted in descending order.
  • the first U sorting results can be selected from the target watermark classification results, and the feature vector of the training video frame can be determined according to the first U sorting results.
  • the watermark classification result of I n2 is 0;
  • the position information of each image block may be obtained by one-hot encoding (or one-bit effective encoding).
  • One-hot encoding uses N-bit status registers to encode N states. Each state has an independent register bit, and at any time, only one of the N states is valid.
  • the technical solution provided by the embodiments of the present application is also applicable to the recognition of a single picture watermark: divide the picture into Multiple image blocks to get the image sequence of pictures.
  • the image sequence of the pictures is input to the target detection model to obtain the classification result of each image block, and the picture feature vector is obtained according to the classification results of multiple image blocks.
  • the picture feature vector is input to the picture watermark recognition model to obtain the watermark recognition probability output by the picture watermark recognition model. When the watermark recognition probability is greater than or equal to the probability threshold, it is determined that the picture contains a watermark.
  • obtaining the picture feature vector according to the classification results of the plurality of image blocks may include: determining the watermark classification result of each image block according to the classification result of each image block in the plurality of image blocks of the picture . According to the watermark classification results of multiple image blocks, a picture feature vector is obtained.
  • obtaining the picture feature vector according to the watermark classification results of the multiple image blocks may include: sorting the watermark classification results of the multiple image blocks according to the probability of the multiple image blocks containing the watermark . Determine the picture feature vector from the sorting results.
  • the picture watermark recognition model can be trained in the following manner: obtaining a third training sample, the third training sample including multiple first training pictures and classification categories of multiple first training pictures. Divide each first training picture into multiple third training image blocks to obtain an image sequence corresponding to each first training picture. Input multiple image sequences corresponding to multiple first training pictures to the target detection model to obtain the classification result of each third training image block, and obtain the first training picture feature vector according to the classification results of all third training image blocks . Using the first training picture feature vector as an input variable and the first training picture classification category as an output variable, the XGBoost model is trained to obtain a picture watermark recognition model.
  • obtaining the first training picture feature vector according to the classification results of multiple third training image blocks may include: according to multiple classification results of each third training image block of each first training picture The watermark classification result of each third training image block is determined. According to the watermark classification results of all the third training image blocks of each first training picture, the feature vector corresponding to each first training picture is obtained.
  • obtaining the feature vector corresponding to each first training picture according to the watermark classification results of all the third training image blocks of each first training picture may include: according to each first training The probability of the multiple third training image blocks of the picture containing the watermark is to sort the watermark classification results of the multiple third training image blocks. The feature vector of each first training picture is determined from the ranking result corresponding to each first training picture.
  • FIG. 4 is a flowchart of another video watermark recognition method provided by an embodiment of the present application. This embodiment can be applied to improve the recognition accuracy of a video watermark.
  • the method can be performed by a video watermark recognition device. It may be implemented in the form of software and/or hardware, and the device may be configured in a device, for example, a computer or a mobile terminal. As shown in Figure 4, the method includes the following steps:
  • Step 210 Divide each video frame of the multiple video frames of the video into multiple image blocks to obtain an image sequence corresponding to each video frame.
  • Step 220 Input a plurality of image sequences respectively corresponding to the plurality of video frames to the target detection model to obtain a classification result of each image block.
  • Step 230 Determine the watermark classification result of each image block according to the classification result of each image block of each video frame.
  • the watermark classification result includes the probability of including a watermark.
  • Step 240 Sort the watermark classification results of the plurality of image blocks of each video frame according to the probability of the plurality of image blocks of each video frame containing the watermark.
  • Step 250 Determine the feature vector of each video frame from the sorting result corresponding to each video frame.
  • Step 260 Obtain a video feature vector according to multiple feature vectors corresponding to the multiple video frames, respectively.
  • Step 270 Input the video feature vector to the watermark recognition model to obtain the watermark recognition probability output by the watermark recognition model.
  • Step 280 Determine whether the watermark recognition probability is greater than or equal to the probability threshold; if the watermark recognition probability is greater than or equal to the probability threshold, perform step 290; if the watermark recognition probability is less than the probability threshold, perform step 2100.
  • Step 290 Determine that the video contains a watermark.
  • Step 2100 Determine that the video does not contain a watermark.
  • FIG. 5 a schematic diagram of application of a video watermark recognition method is given.
  • the probability threshold is set to 0.8.
  • the video in Figure 5 includes two video frames. Each video frame is divided into 4 image blocks to obtain the image sequence of each video frame.
  • each image block has three classification results, "1" in the classification result represents the watermark, "0" represents the background, and the classification probability in the classification result is represented by a value between 0-1, according to the four of each video frame
  • the probability of the image block containing the watermark, the watermark classification results of the four image blocks are sorted in descending order, and the first three watermark classification results are selected from the sorting results to determine the feature vector of each video frame, according to the characteristics of the two video frames
  • the vector obtains the video feature vector, and then the video feature vector is input to the watermark recognition model, and the watermark recognition probability is 0.9. Since the watermark recognition probability is greater than the probability threshold, it is determined that the video contains a watermark.
  • the image sequence corresponding to each video frame is obtained by dividing each of the multiple video frames of the video into multiple image blocks, and the multiple corresponding to the multiple video frames respectively
  • the image sequence is input to the target detection model to obtain the classification result of each image block
  • the video feature vector is obtained according to the classification result of all image blocks
  • the video feature vector is input to the watermark recognition model to obtain the watermark recognition output by the watermark recognition model Probability.
  • the watermark recognition probability is greater than or equal to the probability threshold, it is determined that the video contains a watermark.
  • the above process obtains an image block by dividing the video frame into blocks, which increases the proportion of the video frame occupied by the watermark and reduces the Recognize the difficulty, and enter the multiple image sequences corresponding to the multiple video frames into the target detection model to obtain the watermark recognition results corresponding to the multiple video frames, and then enter the watermark recognition results corresponding to the multiple video frames into the watermark recognition
  • the watermark recognition result of the video is obtained in the model. Because the watermark recognition result corresponding to multiple video frames is used, when the watermark position is not fixed, it can still accurately recognize whether the video contains a watermark, thereby improving the recognition accuracy of the video watermark.
  • FIG. 6 is a flowchart of another video watermark recognition method provided by an embodiment of the present application. This embodiment can be applied to improve the recognition accuracy of a video watermark.
  • the method can be performed by a video watermark recognition device. It may be implemented in the form of software and/or hardware, and the device may be configured in a device, for example, a computer or a mobile terminal. As shown in Figure 6, the method includes the following steps:
  • Step 3010 Divide each video frame of the multiple video frames of the video into multiple image blocks to obtain an image sequence corresponding to each video frame.
  • Step 3020 Input a plurality of image sequences corresponding to the plurality of video frames to the target detection model to obtain a classification result of each image block.
  • Step 3030 Determine the watermark classification result of each image block according to the classification result of each image block of each video frame.
  • the watermark classification result includes the probability of including a watermark.
  • Step 3040 Sort the watermark classification results of the multiple image blocks according to the probability that the multiple image blocks of each video frame contain a watermark.
  • Step 3050 Select a part of the watermark classification results from multiple sorting results corresponding to each video frame for sorting.
  • Step 3060 Determine the feature vector of each video frame from the partial watermark classification result after the sorting result corresponding to each video frame.
  • Step 3070 Obtain a video feature vector according to multiple feature vectors corresponding to the multiple video frames, respectively.
  • Step 3080 Input the video feature vector to the watermark recognition model to obtain the watermark recognition probability output by the watermark recognition model.
  • Step 3090 Determine whether the watermark recognition probability is greater than or equal to the probability threshold; if the watermark recognition probability is greater than or equal to the probability threshold, perform step 3100; if the watermark recognition probability is less than the probability threshold, perform step 3110.
  • Step 3100 Determine that the video contains a watermark.
  • Step 3110 Determine that the video does not contain a watermark.
  • FIG. 7 shows an application schematic diagram of another video watermark recognition method.
  • the set probability threshold is 0.8.
  • the video in FIG. 7 includes two video frames. Each video frame is divided into 4 image blocks to obtain the image sequence of each video frame.
  • each image block has three classification results, "1" in the classification result represents the watermark, "0" represents the background, and the classification probability in the classification result is represented by a value between 0-1, according to The probability of the image block containing the watermark, the watermark classification results of the four image blocks are sorted in descending order, and the first two watermark classification results are selected from the four sorting results as the target watermark classification results, and then the eight target watermarks The classification results are sorted in descending order. The first three sorting results can be selected from the eight target watermark classification results.
  • the feature vector of each video frame is determined, and the video features are obtained according to the feature vectors of the two video frames.
  • Vector and then input the video feature vector to the watermark recognition model to obtain a watermark recognition probability of 0.8. Since the watermark recognition probability is equal to the probability threshold, it is determined that the video contains a watermark.
  • the image sequence corresponding to each video frame is obtained by dividing each of the multiple video frames of the video into multiple image blocks, and the multiple corresponding to the multiple video frames respectively
  • the image sequence is input to the target detection model to obtain the classification result of each image block
  • the video feature vector is obtained according to the classification result of all image blocks
  • the video feature vector is input to the watermark recognition model to obtain the watermark recognition output by the watermark recognition model Probability.
  • the watermark recognition probability is greater than or equal to the probability threshold, it is determined that the video contains a watermark.
  • the above process obtains an image block by dividing the video frame into blocks, which increases the proportion of the video frame occupied by the watermark and reduces the video watermark.
  • FIG. 8 is a schematic structural diagram of a video watermark recognition device provided by an embodiment of the present application. This embodiment can be applied to improve the recognition accuracy of a video watermark.
  • the device can be implemented in software and/or hardware.
  • the device It can be configured in a device, such as a computer or a mobile terminal. As shown in Figure 8, the device includes:
  • the image sequence acquisition module 410 is configured to divide each video frame of the multiple video frames of the video into multiple image blocks to obtain an image sequence corresponding to each video frame.
  • the video feature vector acquisition module 420 is configured to input multiple image sequences corresponding to the multiple video frames to the target detection model to obtain a classification result of each image block, and obtain a video feature vector according to the classification results of all image blocks.
  • the watermark recognition result determination module 430 is configured to input the video feature vector to the watermark recognition model to obtain the watermark recognition probability output by the watermark recognition model, and when the watermark recognition probability is greater than or equal to the probability threshold, determine that the video contains a watermark .
  • FIG. 9 is a schematic structural diagram of a device provided by an embodiment of the present application.
  • 9 shows a block diagram of an exemplary device 512 suitable for implementing embodiments of the present application.
  • the device 512 shown in FIG. 9 is only an example, and should not bring any limitation to the functions and usage scope of the embodiments of the present application.
  • the device 512 is represented in the form of a general-purpose computing device.
  • the components of device 512 may include, but are not limited to, one or more processors 516, system memory 528, and bus 518 connected to different system components (including system memory 528 and processor 516).
  • the processor 516 runs a program stored in the system memory 528 to execute various functional applications and data processing, for example, to implement a video watermark recognition method provided by an embodiment of the present application, including: multiple video frames of a video Each video frame in is divided into multiple image blocks to obtain the image sequence corresponding to each video frame. Multiple image sequences corresponding to the multiple video frames are input to the target detection model to obtain a classification result of each image block, and a video feature vector is obtained according to the classification results of all image blocks. The video feature vector is input to the watermark recognition model to obtain the watermark recognition probability output by the watermark recognition model. When the watermark recognition probability is greater than or equal to a probability threshold, it is determined that the video contains a watermark.
  • An embodiment of the present application also provides a computer-readable storage medium that stores a computer program.
  • a video watermark recognition method as provided in an embodiment of the present application is implemented.
  • the method includes: Each video frame in the multiple video frames is divided into multiple image blocks to obtain an image sequence corresponding to each video frame. Multiple image sequences corresponding to the multiple video frames are input to the target detection model to obtain a classification result of each image block, and a video feature vector is obtained according to the classification results of all image blocks.
  • the video feature vector is input to the watermark recognition model to obtain the watermark recognition probability output by the watermark recognition model. When the watermark recognition probability is greater than or equal to a probability threshold, it is determined that the video contains a watermark.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Signal Processing (AREA)
  • Evolutionary Biology (AREA)
  • Computer Security & Cryptography (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computing Systems (AREA)
  • Image Analysis (AREA)
  • Editing Of Facsimile Originals (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

一种视频水印的识别方法、装置、设备及存储介质。该方法包括:将视频的多个视频帧中的每个视频帧划分为多个图像块,得到每个视频帧对应的图像序列;将所述多个视频帧对应的多个图像序列输入至目标检测模型,得到每个图像块的分类结果,并根据所有图像块的分类结果得到视频特征向量;将视频特征向量输入至水印识别模型,得到所述水印识别模型输出的水印识别概率,在所述水印识别概率大于或等于概率阈值的情况下,确定所述视频包含水印。

Description

视频水印的识别方法、装置、设备及存储介质
本申请要求在2018年12月03日提交中国专利局、申请号为201811465129.4的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及识别技术,例如涉及一种视频水印的识别方法、装置、设备及存储介质。
背景技术
水印是保护版权的重要标志,随着用户版权意识的逐步提高,多种水印也得到了广泛应用。示例性的,将水印嵌入视频中,由于视频可以理解为是由至少两个视频帧组成的,每个视频帧可看作一张图片,因此将水印嵌入视频中可以理解为将水印嵌入多张图片中。
由于水印的广泛使用,因此,水印的识别也成为了一个研究的方向。但由于通常水印占图片比例较小,而且经常出现在图片的非关键区域,如图片底部(如左下角或右下角)或者顶部(如左上角或右上角)等,给视频水印的识别带来了较大困难,使得视频水印的识别精度不高。
发明内容
本申请实施例提供一种视频水印的识别方法、装置、设备及存储介质,以提高视频水印的识别精度。
本申请实施例提供了一种视频水印的识别方法,该方法包括:
将视频的多个视频帧中的每个视频帧划分为多个图像块,得到每个视频帧对应的图像序列;
将所述多个视频帧对应的多个图像序列输入至目标检测模型,得到每个图像块的分类结果,并根据所有图像块的分类结果得到视频特征向量;
将所述视频特征向量输入至水印识别模型,得到所述水印识别模型输出的水印识别概率,在所述水印识别概率大于或等于概率阈值的情况下,确定所述视频包含水印。
本申请实施例还提供了一种视频水印的识别装置,该装置包括:
图像序列获取模块,设置为将视频的多个视频帧中的每个视频帧划分为多 个图像块,得到每个视频帧对应的图像序列;
视频特征向量获取模块,设置为将所述多个视频帧对应的多个图像序列输入至目标检测模型,得到每个图像块的分类结果,并根据所有图像块的分类结果得到视频特征向量;
水印识别结果确定模块,设置为将所述视频特征向量输入至水印识别模型,得到所述水印识别模型输出的水印识别概率,在所述水印识别概率大于或等于概率阈值的情况下,确定所述视频包含水印。
本申请实施例还提供了一种设备,该设备包括:
一个或多个处理器;
存储器,设置为存储一个或多个程序;
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现本申请实施例提供的方法。
第四方面,本申请实施例还提供了一种计算机可读存储介质,存储有计算机程序,该程序被处理器执行时实现本申请实施例提供的方法。
附图说明
图1是本申请实施例提供的一张包含水印的图片的示意图;
图2是本申请实施例提供的另一张包含水印的图片的示意图;
图3是本申请实施例提供的一种视频水印的识别方法的流程图;
图4是本申请实施例提供的另一种视频水印的识别方法的流程图;
图5是本申请实施例提供的一种视频水印的识别方法的应用示意图;
图6是本申请实施例提供的另一种视频水印的识别方法的流程图;
图7是本申请实施例提供的另一种视频水印的识别方法的应用示意图;
图8是本申请实施例提供的一种视频水印的识别装置的结构示意图;
图9是本申请实施例提供的一种设备的结构示意图。
具体实施方式
下面结合附图和实施例对本申请进行说明。此处所描述的具体实施例仅仅用于解释本申请,而非对本申请的限定。为了便于描述,附图中仅示出了与本申请相关的部分而非全部结构。
由于通常水印占图片比例较小,而且经常出现在图片的非关键区域,如图 片底部(如左下角或右下角)或者顶部(如左上角或右上角)等。如图1所示,给出了一张包含水印的图片,图1中,水印位于图片的右上角,给水印识别带来了较大困难,使得水印识别的精度不高。由于视频可以理解为是由至少两个视频帧组成的,每个视频帧可看作一张图片,因此,这里所述的图片可以为静态图片,也可以为动态图片,还可以为视频中的视频帧。
针对视频水印的识别来说,所谓视频水印的识别指的是确定视频中是否包含水印,相应的,视频水印的识别结果为视频包含水印与视频不包含水印两种情况。由于视频是由多个视频帧组成的,且多个视频帧中水印所在的位置可能并不相同,这里所述的水印可以为同一水印,也可能是不同水印,如一个视频是由三个视频帧组成的,按照时间顺序对三个视频帧进行拆分,第一个视频帧中水印位于该视频帧的右上角,第二个视频帧中水印位于该视频帧的左上方,第三个视频帧中水印位于该视频帧的右上方,因此,水印位置的不固定也给视频水印的识别增加了难度。基于上述,为了提高视频水印的识别精度,可考虑增大水印占视频帧的比例,并分别对每个视频帧进行水印识别,在此基础上,根据至少两个视频帧的水印识别结果确定视频的水印识别结果。下面将结合具体实施例对上述内容进行说明。
图3为本申请实施例提供的一种视频水印的识别方法的流程图,本实施例可适用于提高视频水印的识别精度的情况,该方法可以由视频水印的识别装置来执行,该装置可以采用软件和/或硬件的方式实现,该装置可以配置于设备中,例如配置于计算机或移动终端等中。如图3所示,该方法包括如下步骤:
步骤110、将视频的多个视频帧中的每个视频帧划分为多个图像块,得到每个视频帧对应的图像序列。
步骤120、将所述多个视频帧对应的多个图像序列输入至目标检测模型,得到每个图像块的分类结果,并根据所有图像块的分类结果得到视频特征向量。
在本申请的实施例中,视频是指将一系列静态画面以电信号的方式加以捕捉、纪录、处理、储存、传送与重现,当连续的静态画面变化每秒超过24帧画面以上时,根据视觉暂留原理,人眼无法辨别单张的静态画面,看上去是平滑连续的视觉效果,这样连续的静态画面叫做视频。本申请实施例将静态画面称为视频帧。
根据前文所述可知,由于水印通常在视频帧中所占比例比较小。同时,视频中水印所在的位置可能并不固定,增大了视频水印的识别难度,因此,为了提高视频水印的识别精度,可考虑增大水印所占视频帧的比例,并分别对每个视频帧进行水印识别。
将视频的多个视频帧中的每个视频帧划分为多个图像块,以此来增大水印 在视频帧中所占的比例,每个视频帧的多个图像块形成该视频帧对应的图像序列。一实施例中,每个视频帧的多个图像块可以为等高的图像块。示例性的,如一个视频V包括N帧视频帧,则V={I 1,I 2,......,I n,......,I N-1,I N},I n表示第n个视频帧,n∈{1,2,......,N-1,N};将每个视频帧划分为M个图像块,则I n={I n1,I n2,......,I nm,......,I nM-1,I nM},I nm表示第n帧视频帧中第m个图像块,m∈{1,2,......,M-1,M}。I n即为第n帧视频帧的多个图像块所形成的图像序列。
以图1为例进行说明。针对图1所示的视频帧,该视频帧的尺寸大小为256×128,水印位于该视频帧的右上角,水印的尺寸大小为12×6,如图2所示,将该视频帧划分为8个图像块,每个图像块的尺寸大小均为64×64,并将图2中的8个图像块按照从左到右,从上到下的顺序,将8个图像块称为第一图像块、第二图像块、……、第七图像块和第八图像块,由于水印位于视频帧的右上角,因此,该水印出现在第二图像块中,即除第二图像块包含水印外,其它图像块不包含水印。基于上述,未将该视频帧划分8个图像块前,水印在视频帧中所占的比例为
Figure PCTCN2019122609-appb-000001
而将该视频帧划分为8个图形块后,水印在该视频帧的第二图像块中所占的比例为
Figure PCTCN2019122609-appb-000002
由此可见,通过将视频中的每个视频帧划分为多个图像块,可以增大水印在视频帧中所占的比例。
将多个视频帧分别对应的多个图像序列输入至预先训练的目标检测模型中,经过目标检测模型的计算,得到每个图像块的分类结果。一实施例中,预先训练的目标检测模型可以由训练样本基于分类器模型训练生成,训练样本可以包括训练图片、训练图片的分类类别和训练图片的位置信息。常用的分类器模型包括贝叶斯(Bayes)决策、极大似然分类器、贝叶斯分类器、聚类分析模型、神经网络模型、支持向量机模型、混沌与分形模型和隐马尔科夫模型等。分类器模型可以根据实际情况进行设定,在此不作限定。分类结果可以包括图像块的分类类别、图像块的分类概率以及图像块的位置信息。一实施例中,分类类别可以包括水印和背景,如果分类类别为水印,则可以表示该图像块包含水印;如果分类类别为背景,则可以表示该图像块不包含水印。
每个图像块可以包括多个分类结果,分类结果的个数可根据实际情况进行设定,在此不作限定。示例性的,每个图像块T个分类结果,则如前文所述的I nm的分类结果可以表示为B nm={b nm1,b nm2,......,b nmt,......,b nmT-1,b nmT},b nmt表示图像块I nm的第t个分类结果,t∈{1,2,......,T-1,T};b nmt={id,conf,x min,y min,x max,y max},id表示图像块I nm的第t个分类结果 中的分类类别,conf表示图像块I nm的第t个分类结果中的分类概率,x min,y min,x max,y max表示图像块I nm的第t个分类结果中的图像块I nm的位置信息。一实施例中,(x min,y min)表示图像块I nm的左上角的位置信息;(x max,y max)表示图像块I nm的右下角)的位置信息。基于上述,对于视频帧I n,将得到M×T个分类结果。
根据所有图像块的分类结果得到视频特征向量,可以包括:将每个图像块的多个分类结果中分类类别为水印的分类结果作为该图像块的备选分类结果。根据所有图像块的备选分类结果得到多帧视频帧分别对应的多个特征向量。根据多帧视频帧分别对应的多个特征向量得到视频特征向量。本实施例中,根据所有图像块的备选分类结果得到多帧视频帧分别对应的多个特征向量,可以包括如下两种方式:
方式一、针对每个视频帧,根据该帧视频中多个图像块的备选分类结果中的分类概率,对多个图像块的备选分类结果按降序方式进行排序,并选取前U个备选结果得到该帧视频帧的特征向量。如果出现备选结果中分类概率相同的情况,则可随机选择其中一个备选结果即可。此外,如果出现一个图像块中备选分类结果的个数小于U,则可用预设标识来补充,这里所述的预设标识可以为-1。进行上述操作的目的在于,保持多帧视频帧的特征向量的维数相同。
方式二、针对每个视频帧,根据该帧视频中每个图像块的备选分类结果中的分类概率,对每个图像块的备选分类结果按降序方式进行排序,并选取每个图像块的备选分类结果的排序结果中前V个备选分类结果作为该图像块的目标分类结果。根据多个图像块的目标分类结果中的分类概率,对多个图像块的目标分类结果按降序方式进行排序,并选取前U个目标分类结果得到该帧视频帧的特征向量,1<V<U。同样,如果出现备选分类结果和/或目标分类结果中分类概率相同的情况,则可随机选择其中一个备选结果和/或目标分类结果即可。此外,如果出现一个图像块中备选分类结果的个数小于V,则可用预设标识来补充;和/或,如果出现一个图像块中目标分类结果的个数小于V,则也用标识来补充,这里所述的预设标识可以为-1。进行上述操作的目的在于,保持多帧视频帧的特征向量的维数相同。
两者方式的区别点在于:针对一帧视频帧而言,方式一是直接对多个图像块的全部备选结果,根据分类概率,按降序方式进行排序,选取前U个备选结果组成该视频帧的特征向量,而方式二先对每个图像块的备选分类结果进行一次筛选,得到每个图像块的目标分类结果,再根据多个图像块的目标分类结果中的分类概率,对多个图像块的目标分类结果按降序方式进行排序,并选取前U个目标分类结果组成该视频帧的特征向量,即方式一是通过一次筛选来确定视 频帧的特征向量,而方式二是通过两次筛选来确定视频帧的特征向量。
由于方式二中第一筛选并不是对全部图像块的备选分类结果进行排序,而是对每个图像块的备选分类结果进行排序,因此,相比于方式一对全部图像块的备选分类结果进行排序而言,降低了数据处理难度。同时,虽然方式二中第二次筛选是对全部图像块的目标分类结果进行排序,但由于通过第一筛选减少了数据处理量,因此,相比于方式一而言,仍降低了数据处理难度。基于上述,当视频的帧数较多时,可采用方式二,以降低数据处理难度;当视频的帧数较少时,可采用方式一或方式二。
针对方式一,对于每个视频帧而言,根据分类概率对多个图像块的备选分类结果,按降序方式进行排序,并选取前U个备选结果组成该视频帧的特征向量。示例性的,对于视频V中的视频帧I n,将视频帧I n划分为M=4个图像块,则I n={I n1,I n2,I n3,I n4},每个图像块包括T=5个分类结果,则I n1的分类结果可以表示为B n1={b n11,b n12,b n13,b n14,b n15},I n2的分类结果可以表示为B n2={b n21,b n22,b n23,b n24,b n25},I n3的分类结果可以表示为B n3={b n31,b n32,b n33,b n34,b n35},I n4的分类结果可以表示为B n4={b n41,b n42,b n43,b n44,b n45}。
I n1的备选分类结果为B n1'={b n11,b n12,b n13},b n11、b n12和b n13的分类概率分别为0.9、0.8和0.7;I n2的备选分类结果为0个;I n3的备选分类结果B n3'={b n31,b n33,b n34},b n31、b n33和b n34的分类概率分别为0.3、0.4和0.2;I n4的备选分类结果为B n4'={b n42,b n43},b n42和b n43的分类概率分别为0.6和0.5。
将I n1、I n2、I n3和I n4的全部备选分类结果,根据分类概率,按降序方式进行排序,排序结果为b n11、b n12、b n13、b n42、b n43、b n33、b n31和b n34,并选取前U=4个备选分类结果b n11、b n12、b n13和b n42组成该视频帧的特征向量,即I n={b n11,b n12,b n13,b n42}。
针对方式二,对于每个视频帧而言,首先根据多个图像块的备选分类结果中的分类概率,分别对多个图像块的备选分类结果按降序方式进行排序,并选取每个图像块的备选分类结果的排序结果中前V个备选分类结果作为该图像块的目标分类结果,即先对每个图像块的备选分类结果进行一次筛选,得到每个图像块的目标分类结果。再根据多个图像块的目标分类结果中的分类概率,对多个图像块的目标分类结果按降序方式进行排序,并选取前U个目标分类结果组成该视频帧的特征向量。
示例性的,对于视频V中的视频帧I n,将视频帧I n划分为M=4个图像块,则I n={I n1,I n2,I n3,I n4},每个图像块包括T=5个分类结果,则I n1的分类结果可以表示为B n1={b n11,b n12,b n13,b n14,b n15},I n2的分类结果可以表示为B n2={b n21,b n22,b n23,b n24,b n25},I n3的分类结果可以表示为B n3={b n31,b n32,b n33,b n34,b n35},I n4的分类结果可以表示为B n4={b n41,b n42,b n43,b n44,b n45}。
I n1的备选分类结果为B n1'={b n11,b n12,b n13},b n11、b n12和b n13的分类概率分别为0.9、0.8和0.7;I n2的备选分类结果为0个;I n3的备选分类结果B n3'={b n31,b n33,b n34},b n31、b n33和b n34的分类概率分别为0.3、0.4和0.2;I n4的备选分类结果为B n4'={b n42,b n43},b n42和b n43的分类概率分别为0.6和0.5。
对于每个图像块,选取前V=2个备选分类结果作为每个图像块的目标分类结果。即I n1的目标分类结果为B n1”={b n11,b n12},b n11和b n12的分类概率分别为0.9、0.8;I n2的目标分类结果为0个;I n3的目标分类结果B n3”={b n31,b n33},b n31和b n33的分类概率分别为0.3和0.4;I n4的目标分类结果为B n4”={b n42,b n43},b n42和b n43的分类概率分别为0.6和0.5。
将I n1、I n2、I n3和I n4的全部目标分类结果,根据分类概率,按降序方式进行排序,排序结果为b n11、b n12、b n42、b n43、b n33和b n31,并选取前U=4个目标分类结果b n11、b n12、b n42和b n43组成该视频帧的特征向量,即I n={b n11,b n12,b n42,b n43}。
在得到多个视频帧分别对应的多个特征向量后,可以根据多个视频帧分别对应的多个特征向量得到视频特征向量。本实施例中,视频特征向量为由多个视频帧分别对应的多个特征向量所组成的向量集合。示例性的,如一个视频V包括N个视频帧,则V={I 1,I 2,......,I n,......,I N-1,I N},I n表示第n个视频帧,n∈{1,2,......,N-1,N},I n可以通过前文所述方式进行确定。
步骤130、将视频特征向量输入至水印识别模型,得到水印识别模型输出的水印识别概率。
步骤140、判断水印识别概率是否大于或等于概率阈值;若水印识别概率大于或等于概率阈值,则执行步骤150;若水印识别概率小等于概率阈值,则执行步骤160。
步骤150、确定视频包含水印。
步骤160、确定视频不包含水印。
在本申请的实施例中,将视频特征向量输入至预先训练的水印识别模型中,经过水印识别模型的计算,得到视频的水印识别概率。一实施例中,预先训练的水印识别模型可以由训练样本基于极限梯度提升(eXtreme Gradient Boosting,Xgboost)模型训练生成,训练样本可以是训练视频特征向量和训练视频的分类类别。
如果水印识别概率大于或等于概率阈值,则可以确定该视频包含水印;如果水印识别概率小于概率阈值,则可以确定该视频不包含水印。本实施例中,概率阈值可以作为确定视频是否包含水印的依据,其数值大小可根据实际情况进行设定,在此不作限定。示例性的,如概率阈值为0.9。
本实施例的技术方案,通过将视频的多个视频帧中的每个视频帧划分为多个图像块,得到每个视频帧对应的图像序列,将所述多个视频帧对应的多个图像序列输入至目标检测模型,得到每个图像块的分类结果,并根据所有图像块的分类结果得到视频特征向量,将视频特征向量输入至水印识别模型,得到所述水印识别模型输出的水印识别概率,在所述水印识别概率大于或等于概率阈值的情况下,确定视频包含水印,上述通过将视频帧分块处理得到图像块,增大了水印所占视频帧的比例,降低了视频水印的识别难度,并通过将多个视频帧分别对应的多个图像序列输入至目标检测模型中,得到多个视频帧对应的水印识别结果,再将多个视频帧对应的水印识别结果输入至水印识别模型中得到视频的水印识别结果,由于利用了多个视频帧对应的水印识别结果,因此,当水印位置不固定时,仍可准确识别视频中是否包含水印,从而提高了视频水印的识别精度。
在上述技术方案的基础上,根据所有图像块的分类结果得到视频特征向量,包括:根据每个视频帧的每个图像块的多个分类结果确定所述每个图像块的水印分类结果;根据每个视频帧的所有图像块的水印分类结果,得到所述每个视频帧对应的特征向量。根据多个视频帧分别对应的多个特征向量得到视频特征向量。
在本申请的实施例中,根据每个视频帧的每个图像块的分类结果确定所述每个图像块的水印分类结果,可作如下理解:图像块的分类结果可以包括图像块的分类类别、图像块的分类概率和图像块的位置信息,图像块的分类类别包括水印和背景,可将分类类别为水印的分类结果称为水印分类结果,可将分类类别为背景的分类结果称为背景分类结果。针对每个图像块来说,由于分类结果可能为水印分类结果,也可能为背景分类结果,目的确定视频是否包含水印,因此,可将水印分类结果认为是有效的分类结果。基于上述,根据所有图像块 的水印分类结果,得到每个视频帧对应的特征向量,即将每个图像块的分类结果为水印分类结果作为该图像块的备选分类结果,根据每个视频帧的所有图像块的备选分类结果得到所述每个视频帧对应的特征向量。
根据每个视频帧的所有图像块的水印分类结果,得到所述每个视频帧对应的特征向量,可以包括如下两种方式:
方式一、根据每个视频帧的多个图像块的包含水印的概率,对所述多个图像块的水印分类结果进行排序。从每个视频帧对应的排序结果中确定所述每个视频帧的特征向量。即针对每个视频帧,将每个图像块的分类结果为水印分类结果作为每个图像块的备选分类结果。根据多个图像块的备选分类结果中的分类概率,对多个图像块的备选分类结果按降序方式进行排序,并选取前U个备选结果得到该视频帧的特征向量。
方式二、根据每个帧视频帧的每个图像块的包含水印的概率,对所述每个图像块的水印分类结果进行排序。从每个视频帧对应的多个排序结果中选取部分水印分类结果进行排序。从每个视频帧对应的排序后的部分水印分类结果中确定所述每个视频帧的特征向量。即针对每个视频帧,将每个图像块的分类结果为水印分类结果作为所述每个图像块的备选分类结果。根据每个图像块的备选分类结果中的分类概率,对每个图像块的备选分类结果按降序方式进行排序,并选取前V个备选分类结果作为该图像块的目标分类结果。根据多个图像块的目标分类结果中的分类概率,对多个图像块的目标分类结果按降序方式进行排序,并选取前U个目标分类结果得到该视频帧的特征向量,1<V<U。
在上述技术方案的基础上,根据每个视频帧的所有图像块的水印分类结果,得到每个视频帧对应的特征向量,可以包括:根据每个视频帧的多个图像块的包含水印的概率,对所述多个图像块的水印分类结果进行排序。从每个视频帧对应的排序结果中确定所述每个视频帧的特征向量。
在本申请的实施例中,根据每个视频帧的多个图像块的包含水印的概率,对所述多个图像块的水印分类结果进行排序,从每个视频帧对应的排序结果中确定所述每个视频帧的特征向量,可作如下理解:每个图像块的水印分类结果中所包括的分类概率即为包含水印的概率,根据每个视频帧的多个图像块的包含水印的概率,对所述多个图像块的水印分类结果按降序方式进行排序,可从每个视频帧对应的排序结果中选取前U个排序结果,根据前U个排序结果确定所述每个视频帧的特征向量。
示例性的,如对于视频V中的视频帧I n,将视频帧I n划分为M=4个图像块,则I n={I n1,I n2,I n3,I n4},每个图像块包括T=5个分类结果,则I n1的分 类结果可以表示为B n1={b n11,b n12,b n13,b n14,b n15},I n2的分类结果可以表示为B n2={b n21,b n22,b n23,b n24,b n25},I n3的分类结果可以表示为B n3={b n31,b n32,b n33,b n34,b n35},I n4的分类结果可以表示为B n4={b n41,b n42,b n43,b n44,b n45}。
I n1的水印分类结果为B n1'={b n11,b n12,b n13},b n11、b n12和b n13中包含水印的概率分别为0.9、0.8和0.7;I n2的水印分类结果为0个;I n3的水印分类结果B n3'={b n31,b n33,b n34},b n31、b n33和b n34中包含水印的概率分别为0.3、0.4和0.2;I n4的水印分类结果为B n4'={b n42,b n43},b n42和b n43中包含水印的概率分别为0.6和0.5。
将I n1、I n2、I n3和I n4的全部水印分类结果,根据包含水印的概率,按降序方式进行排序,排序结果为b n11、b n12、b n13、b n42、b n43、b n33、b n31和b n34,并选取前U=4个水印分类结果b n11、b n12、b n13和b n42组成该视频帧的特征向量,即I n={b n11,b n12,b n13,b n42}。
在上述技术方案的基础上,根据每个视频帧的所有图像块的水印分类结果,得到每个视频帧对应的特征向量,可以包括:根据每个视频帧的每个图像块的包含水印的概率,对所述每个图像块的水印分类结果进行排序。从每个视频帧对应的多个排序结果中选取部分水印分类结果进行排序。从每个视频帧对应的排序后的部分水印分类结果中确定所述每个视频帧的特征向量。
在本申请的实施例中,根据每个视频帧的每个图像块的包含水印的概率,对所述每个图像块的水印分类结果进行排序,从每个视频帧对应的多个排序结果中选取部分水印分类结果进行排序,从每个视频帧对应的排序后的部分水印分类结果中确定所述每个视频帧的特征向量,可作如下理解:每个图像块的水印分类结果中所包括的分类概率即为包含水印的概率,根据每个视频帧的每个图像块的包含水印的概率,对所述每个图像块的水印分类结果按降序方式进行排序,从多个排序结果中选取前V个水印分类结果作为目标水印分类结果,再对目标水印分类结果按降序方式进行排序,可从目标水印分类结果中选取前U个排序结果,根据前U个排序结果确定该视频帧的特征向量。
示例性的,如对于视频V中的视频帧I n,将视频帧I n划分为M=4个图像块,则I n={I n1,I n2,I n3,I n4},每个图像块包括T=5个分类结果,则I n1的分类结果可以表示为B n1={b n11,b n12,b n13,b n14,b n15},I n2的分类结果可以表示为B n2={b n21,b n22,b n23,b n24,b n25},I n3的分类结果可以表示为B n3={b n31,b n32,b n33,b n34,b n35},I n4的分类结果可以表示为B n4={b n41,b n42,b n43,b n44,b n45}。
I n1的水印分类结果为B n1'={b n11,b n12,b n13},b n11、b n12和b n13中包含的水印概率分别为0.9、0.8和0.7;I n2的水印分类结果为0个;I n3的水印分类结果B n3'={b n31,b n33,b n34},b n31、b n33和b n34中包含的水印概率分别为0.3、0.4和0.2;I n4的水印分类结果为B n4'={b n42,b n43},b n42和b n43中包含水印的概率分别为0.6和0.5。
对于每个图像块,选取前V=2个水印分类结果作为该图像块的目标水印分类结果。即I n1的目标水印分类结果为B n1”={b n11,b n12},b n11和b n12中包含水印的概率分别为0.9、0.8;I n2的目标水印分类结果为0个;I n3的目标水印分类结果B n3”={b n31,b n33},b n31和b n33中包含水印的概率分别为0.3和0.4;I n4的目标水印分类结果为B n4”={b n42,b n43},b n42和b n43中包含水印的概率分别为0.6和0.5。
将I n1、I n2、I n3和I n4的全部目标水印分类结果,根据分类概率,按降序方式进行排序,排序结果为b n11、b n12、b n42、b n43、b n33和b n31,并选取前U=4个目标水印分类结果b n11、b n12、b n42和b n43组成该视频帧的特征向量,即I n={b n11,b n12,b n42,b n43}。
在上述技术方案的基础上,可以通过如下方式训练目标检测模型:获取第一训练样本,第一训练样本包括多张训练图片、所述多张训练图片的分类类别和所述多张训练图片的位置信息。将所述多张训练图片中的每张训练图片划分为多个第一训练图像块,并根据每张训练图片的分类类别得到所述每张训练图片的每个第一训练图像块的分类类别以及根据每张训练图片的位置信息得到所述每张训练图片的每个第一训练图像块的位置信息。将所有第一训练图像块作为输入变量,所述所有第一训图像块的分类类别和位置信息作为输出变量,训练分类器模型,得到目标检测模型。
在本申请的实施例中,可以通过如下方式训练目标检测模型:获取第一训练样本,第一训练样本可以包括多张训练图片、所述多张训练图片的分类类别和所述多张训练图片的位置信息,将所述多张训练图片中的每张训练图片划分为多个第一训练图像块,并根据每张训练图片的分类类别得到所述每张训练图片的每个第一训练图像块的分类类别以及根据每张训练图片的位置信息得到所述每张训练图片的每个第一训练图像块的位置信息。一实施例中,训练图片可以包括包含水印的图片,还可以包括不包含水印的图片(即背景图片),这里所述的背景图片可以为包含字幕的图片。相应的,训练图片的分类类别可以包括水印和背景。
根据每张训练图片的分类类别得到所述每张训练图片的每个第一训练图像块的分类类别,以及,根据每张训练图片的位置信息得到所述每张训练图片的每个第一训练图像块的位置信息,可作如下理解:根据每张训练图片的分类类别可以标注所述每张训练图片的每个第一训练图像块的分类类别,以及,根据每张训练图片的位置信息可以标注所述每张训练图片的每个第一训练图像块的位置信息。示例性的,已知一张训练图片的分类类别为水印,以及,该训练图片的位置信息,则将该训练图片划分为多个第一训练图像块后,将出现水印的第一训练图像块的分类类别标注为水印,将未出现水印的第一训练图像块的分类类别标注为背景,以及,根据该训练图片的位置信息标注每个第一训练图像块的位置信息。
将所有第一训练图像块作为输入变量,所述所有第一训练图像块的分类类别和所述所有第一训练图像块的位置信息作为输出变量,训练分类器模型,得到目标检测模型,可作如下理解:分类器模型可以包括支持向量机模型或神经网络模型等。本实施例中,神经网络模型是基于生物学中神经网络的基本原理,在理解和抽象了人脑结构和外界刺激响应机制后,以网络拓扑知识为理论基础,模拟人脑的神经系统对复杂信息的处理机制的一种数学模型。该模型是依靠系统的复杂程度,通过调整内部大量节点(神经元)之间相互连接的权值,来实现处理信息的。神经网络模型可以包括卷积神经网络模型、循环神经网络模型和深度神经网络模型,下面以卷积神经网络模型为例进行说明,卷积神经网络模型解决的核心问题就是如何自动提取并抽象特征,进而将特征映射到任务目标解决实际问题,一个卷积神经网络一般由以下三部分组成,第一部分是输入层,第二部分由卷积层、激活层和池化层(或下采样层)组合而成,第三部分由一个全连接的多层感知机分类器(即全连接层)构成。卷积神经网络模型具有权值共享特性,权值共享即指卷积核,可以通过一个卷积核的操作提取图像数据的不同位置的同样特征,换句话说,即是在一张图像数据中的不同位置的相同目标,它们的特征是基本相同的。使用一个卷积核只能得到一部分特征,可以通过设置多核卷积,用每个卷积核来学习不同的特征来提取图片的特征。在图片分类中,卷积层的作用是将低层次的特征抽取分析为高层次特征,低层次的特征是基本特征,诸如纹理和边缘等特征,高层次特征如人脸和物体的形状等,更能表现样本的属性,这个过程就是卷积神经网络模型的层次性。全连接层在整个卷积神经网络中起到“分类器”的作用。如果说卷积层、激励层和池化层等操作是将原始数据映射到隐层特征空间的话,全连接层则起到将学到的“分布式特征表示”映射到样本标记空间的作用。在实际使用中,全连接层可由卷积操作实现:对前层是全连接的全连接层可以转化为卷积核为1×1的卷积;而前层是卷积层的全连接层可以转化为卷积核为H×W的全局卷积,H和 W分别为前层卷积结果的高和宽。
卷积神经网络模型的训练过程是经过前向传播计算卷积神经网络模型的损失函数,并计算损失函数对网络参数的偏导数,采用反向梯度传播方法,对卷积神经网络模型的网络参数进行调整,直至卷积神经网络模型的损失函数达到预设函数值。当卷积神经网络模型的损失函数值达到预设函数值时,表示卷积神经网络模型已训练完成,此时,卷积神经网络模型的网络参数也得以确定,可将已训练完成的卷积神经网络模型作为目标检测模型。本实施例中,损失函数是将一个或多个变量的一个事件或值映射为可以直观地表示一种与之相关“成本”的实数的函数,即损失函数将一个或多个变量的事件映射到与一个成本相关的实数上。损失函数可以用于测量模型性能以及实际值与预测值之间的不一致性,模型性能随着损失函数的值的降低而增加。针对本申请实施例来说,这里的预测值指的是将所有第一训练图像块作为输入变量输入至卷积神经网络模型所得的每个第一训练图像块的分类类别和每个第一训练图像块的位置信息,实际值指的是每个第一训练图像块的实际分类类别和每个第一训练图像块的实际位置信息。本实施例中,损失函数可以为交叉熵损失函数、0-1损失函数、平方损失函数、绝对损失函数和对数损失函数等,可根据实际情况进行设定,在此不作限定。
通过在第一训练样本中加入易被误检的训练图片,如包含字幕的训练图片等,并将易被误检的训练图片分类类别设置为背景,从而降低了目标检测模型的误检率,进而也提升了目标检测模型的预测性能。
在上述技术方案的基础上,将所有第一训练图像块作为输入变量,所述所有第一训练图像块的分类类别和位置信息作为输出变量,训练分类器模型,得到目标检测模型,可以包括:获取每个第一训练图像块的尺寸信息。对所有第一训练图像块的尺寸信息进行聚类分析,确定每个第一训练图像块的先验框。将所述所有第一训练图像块和所述所有第一训练图像块的先验框作为输入变量,所述所有第一训练图像块的分类类别和位置信息作为输出变量,训练分类器模型,得到目标检测模型。
在本申请的实施例中,为了提升目标检测模型的预测性能,可在将每个训练图片划分为多个第一训练图像块后,获取每个第一训练图像块的尺寸信息,并对所有第一训练图像块的尺寸信息进行聚类分析来确定每个第一训练图像块的先验框,即通过聚类分析为每个第一训练图像块选择对应的先验框。聚类分析的基本思想是根据多个待分类的模式特征的相似程度进行分类,相似的归为一类,不相似的作为另一类。简单地说,相似就是两个特征矢量中对应的分量分别较接近。聚类分析包括两个基本内容,即模式相似性的度量和聚类算法。
将所有第一训练图像块的先验框也作为输入变量,参与到训练分类器模型的过程中,可以提高分类器模型预测每个第一训练图像块的位置信息的预测精度,进而提升了目标检测模型的预测性能。
在上述技术方案的基础上,可以通过如下方式训练水印识别模型:获取第二训练样本,第二训练样本包括训练视频和训练视频的分类类别。将训练视频的多个训练视频帧中的每个训练视频帧划分为多个第二训练图像块,得到所述每个训练视频帧对应的图像序列。将多个训练视频帧分别对应的多个图像序列输入至目标检测模型,得到每个第二训练图像块的分类结果,并根据所有第二训练图像块的分类结果得到训练视频特征向量。将训练视频特征向量作为输入变量,训练视频的分类类别作为输出变量,训练XGBoost模型,得到水印识别模型。
在本申请的实施例中,XGBoost模型是GBDT(Gradient Boosting Decision Tree,迭代提升决策树)模型的改进版本,XGBoost模型的基本思想是不断地降低残差,使先前的模型残差在梯度方向上进一步降低,综合多个基本学习器,进而得到强学习器。XGBoost模型的目标函数是一个二阶泰勒展开式,相比于一阶泰勒展开式,它的学习范围更广,泛化能力更强,使得模型的稳定性更高,同时在目标函数中增加了正则化项、阈值和系数。XGBoost模型可以有效避免过拟合的发生,通过L1或者L2正则优化目标函数,同时加入学习率在梯度范围内快速收敛到一值,使XGBoost模型能够寻找到最优值;添加阈值进行了剪枝来限制树的生成;添加系数对叶子节点的值做了平滑,防止过拟合。上述表明,XGBoost模型作为一种新型提升决策树模型,具有准确度高、运算速度快、可扩展性好以及可提取特征重要性等优点,能在保证一定速度的情况下提高水印识别精度。
一实施例中,训练视频的分类类别为1或者0,其中,1代表水印,0代表背景,通过将训练视频特征向量作为输入变量,训练视频的分类类别作为输出变量,训练XGBoost模型,可以得到水印识别模型。在使用该水印识别模型的过程中,针对输入的视频特征向量,可以得到与该视频特征向量对应的水印识别概率,水印识别概率的范围为[0,1],水印识别概率越接近1,表示输入的视频特征向量对应的视频中存在水印的概率越高。
获取第二训练样本,第二训练样本包括训练视频和训练视频的分类类别,将训练视频的多个训练视频帧中的每个训练视频帧划分为多个第二训练图像块,得到所述每个训练视频帧对应的图像序列,并将多个训练视频帧分别对应的多个图像序列输入至目标检测模型,得到每个第二训练图像块的分类结果,并根据所有第二训练图像块的分类结果得到训练视频特征向量,再将训练视频特征 向量作为输入变量,训练视频的分类类别作为输出变量,训练XGBoost模型,得到水印识别模型。
在上述技术方案的基础上,根据所有第二训练图像块的分类结果得到训练视频特征向量,可以包括:根据每个训练视频帧的每个第二训练图像块的多个分类结果确定所述每个第二训练图像块的水印分类结果。根据每个第二训练图像块的所有第二训练图像块的水印分类结果,得到所述每个训练视频帧对应的特征向量。根据多个训练视频帧分别对应的多个特征向量得到训练视频特征向量。
在本申请的实施例中,根据每个训练视频帧的每个第二训练图像块的多个分类结果确定所述每个第二训练图像块的水印分类结果,可作如下理解:第二训练图像块的分类结果可以包括第二训练图像块的分类类别、第二训练图像块的分类概率和第二训练图像块的位置信息,第二训练图像块的分类类别包括水印和背景,可将分类类别为水印的分类结果称为水印分类结果,可将分类类别为背景的分类结果称为背景分类结果。针对每个第二训练图像块来说,由于分类结果可能为水印分类结果,也可能为背景分类结果,目的确定视频是否包含水印,因此,可将水印分类结果认为是有效的分类结果。基于上述,根据每个训练视频帧的所有第二训练图像块的水印分类结果,得到所述每个训练视频帧对应的特征向量,即将每个第二训练图像块的分类结果为水印分类结果作为该第二训练图像块的备选分类结果,根据每个训练视频帧的所有第二训练图像块的备选分类结果得到所述每个训练视频帧对应的特征向量。
根据每个训练视频帧的所有第二训练图像块的水印分类结果,得到所述每个训练视频帧对应的特征向量,可以包括如下两种方式:
方式一、根据每个训练视频帧的多个第二训练图像块的包含水印的概率,对所述多个第二训练图像块的水印分类结果进行排序。从每个训练视频帧对应的排序结果中确定所述每个训练视频帧的特征向量。即针对每个训练视频帧,将每个第二训练图像块的分类结果为水印分类结果作为每个第二训练图像块的备选分类结果。根据多个第二训练图像块的备选分类结果中的分类概率,对多个第二训练图像块的备选分类结果按降序方式进行排序,并选取前U个备选结果得到该训练视频帧的特征向量。
方式二、根据每个训练视频帧的每个第二训练图像块的包含水印的概率,对所述每个第二训练图像块的水印分类结果进行排序。从每个训练视频帧的多个排序结果中选取部分水印分类结果进行排序。从每个视频帧对应的排序后的部分水印分类结果中确定所述每个训练视频帧的特征向量。即针对每个训练视频帧,将每个第二训练图像块的分类结果为水印分类结果作为每个第二训练图 像块的备选分类结果。根据每个第二训练图像块的备选分类结果中的分类概率,对每个第二训练图像块的备选分类结果按降序方式进行排序,并选取前V个备选分类结果作为该第二训练图像块的目标分类结果。根据多个第二训练图像块的目标分类结果中的分类概率,对多个第二训练图像块的目标分类结果按降序方式进行排序,并选取前U个目标分类结果得到该训练视频帧的特征向量,1<V<U。
在上述技术方案的基础上,根据每个训练视频帧的所有第二训练图像块的水印分类结果,得到所述每个训练视频帧对应的特征向量,可以包括:根据每个训练视频帧的多个第二训练图像块的包含水印的概率,对所述多个第二训练图像块的水印分类结果进行排序。从每个训练视频帧对应的排序结果中确定所述每个训练视频帧的特征向量。
在本申请的实施例中,根据每个训练视频帧的多个第二训练图像块的包含水印的概率,对所述多个第二训练图像块的水印分类结果进行排序,从每个训练视频帧对应的排序结果中确定所述每个训练视频帧的特征向量,可作如下理解:每个第二训练图像块的水印分类结果中所包括的分类概率即为包含水印的概率,根据每个训练视频帧的多个第二训练图像块的包含水印的概率,对所述多个第二训练图像块的水印分类结果按降序方式进行排序,可从每个训练视频帧对应的排序结果中选取前U个排序结果,根据前U个排序结果确定所述每个训练视频帧的特征向量。
示例性的,如对于训练视频V中的训练视频帧I n,将训练视频帧I n划分为M=4个第二训练图像块,则I n={I n1,I n2,I n3,I n4},每个第二训练图像块包括T=5个分类结果,则I n1的分类结果可以表示为B n1={b n11,b n12,b n13,b n14,b n15},I n2的分类结果可以表示为B n2={b n21,b n22,b n23,b n24,b n25},I n3的分类结果可以表示为B n3={b n31,b n32,b n33,b n34,b n35},I n4的分类结果可以表示为B n4={b n41,b n42,b n43,b n44,b n45}。
I n1的水印分类结果为B n1'={b n11,b n12,b n13},b n11、b n12和b n13中包含水印的概率分别为0.9、0.8和0.7;I n2的水印分类结果为0个;I n3的水印分类结果B n3'={b n31,b n33,b n34},b n31、b n33和b n34中包含水印的概率分别为0.3、0.4和0.2;I n4的水印分类结果为B n4'={b n42,b n43},b n42和b n43中包含水印的概率分别为0.6和0.5。
将I n1、I n2、I n3和I n4的全部水印分类结果,根据包含水印的概率,按降序方式进行排序,排序结果为b n11、b n12、b n13、b n42、b n43、b n33、b n31和b n34,并选取前U=4个水印分类结果b n11、b n12、b n13和b n42组成该训练视频帧的特征向量,即I n={b n11,b n12,b n13,b n42}。
在上述技术方案的基础上,根据每个训练视频帧的所有第二训练图像块的水印分类结果,得到所述每个训练视频帧对应的特征向量,可以包括:根据每个训练视频帧的每个第二训练图像块的包含水印的概率,对所述每个第二训练图像块的水印分类结果进行排序。从每个训练视频帧对应的多个排序结果中选取部分水印分类结果进行排序。从每个视频帧对应的排序后的部分水印分类结果中确定所述每个训练视频帧的特征向量。
在本申请的实施例中,根据每个训练视频帧的每个第二训练图像块的包含水印的概率,对所述每个第二训练图像块的水印分类结果进行排序,从每个训练视频帧对应的多个排序结果中选取部分水印分类结果进行排序,从每个视频帧对应的排序后的部分水印分类结果中确定所述每个视频帧对应的训练视频帧的特征向量,可作如下理解:每个第二训练图像块的水印分类结果中所包括的分类概率即为包含水印的概率,根据每个训练视频帧的每个第二训练图像块的包含水印的概率,对所述每个第二训练图像块的水印分类结果按降序方式进行排序,从每个训练视频帧对应的多个排序结果中选取前V个水印分类结果作为目标水印分类结果,再对目标水印分类结果按降序方式进行排序,可从目标水印分类结果中选取前U个排序结果,根据前U个排序结果确定该训练视频帧的特征向量。
示例性的,如对于训练视频V中的训练视频帧I n,将训练视频帧I n划分为M=4个第二训练图像块,则I n={I n1,I n2,I n3,I n4},每个第二训练图像块包括T=5个分类结果,则I n1的分类结果可以表示为B n1={b n11,b n12,b n13,b n14,b n15},I n2的分类结果可以表示为B n2={b n21,b n22,b n23,b n24,b n25},I n3的分类结果可以表示为B n3={b n31,b n32,b n33,b n34,b n35},I n4的分类结果可以表示为B n4={b n41,b n42,b n43,b n44,b n45}。
I n1的水印分类结果为B n1'={b n11,b n12,b n13},b n11、b n12和b n13中包含的水印概率分别为0.9、0.8和0.7;I n2的水印分类结果为0个;I n3的水印分类结果B n3'={b n31,b n33,b n34},b n31、b n33和b n34中包含的水印概率分别为0.3、0.4和 0.2;I n4的水印分类结果为B n4'={b n42,b n43},b n42和b n43中包含水印的概率分别为0.6和0.5。
对于每个第二训练图像块,选取前V=2个水印分类结果作为该第二训练图像块的目标水印分类结果。即I n1的目标水印分类结果为B n1”={b n11,b n12},b n11和b n12中包含水印的概率分别为0.9、0.8;I n2的目标水印分类结果为0个;I n3的目标水印分类结果B n3”={b n31,b n33},b n31和b n33中包含水印的概率分别为0.3和0.4;I n4的目标水印分类结果为B n4”={b n42,b n43},b n42和b n43中包含水印的概率分别为0.6和0.5。
将I n1、I n2、I n3和I n4的全部目标水印分类结果,根据分类概率,按降序方式进行排序,排序结果为b n11、b n12、b n42、b n43、b n33和b n31,并选取前U=4个目标水印分类结果b n11、b n12、b n42和b n43组成该训练视频帧的特征向量,即I n={b n11,b n12,b n42,b n43}。
本申请实施例中可以通过独热编码(或称一位有效编码)来获取每个图像块的位置信息。独热编码是使用N位状态寄存器来对N个状态进行编码,每个状态都有独立的寄存器位,并且在任意时候,N个状态中只有一位有效。
由于视频是由至少两帧视频帧组成的,每个视频帧实质上是单张图片,因此,本申请实施例所提供的技术方案的思路同样适用于单张图片水印的识别:将图片划分为多个图像块,得到图片的图像序列。将图片的图像序列输入至目标检测模型,得到每个图像块的分类结果,并根据多个图像块的分类结果得到图片特征向量。将图片特征向量输入至图片水印识别模型,得到图片水印识别模型输出的水印识别概率,在水印识别概率大于或等于概率阈值的情况下,确定图片包含水印。
在上述技术方案的基础上,根据多个图像块的分类结果得到图片特征向量,可以包括:根据图片的多个图像块中每个图像块的分类结果确定所述每个图像块的水印分类结果。根据多个图像块的水印分类结果,得到图片特征向量。
在上述技术方案的基础上,根据多个图像块的水印分类结果,得到图片特征向量,可以包括:根据多个图像块的包含水印的概率,对所述多个图像块的水印分类结果进行排序。从排序结果中确定图片特征向量。
在上述技术方案的基础上,可以通过如下方式训练图片水印识别模型:获取第三训练样本,所述第三训练样本包括多张第一训练图片和多张第一训练图片的分类类别。将每张第一训练图片划分为多个第三训练图像块,得到每张第一训练图片对应的图像序列。将多张第一训练图片分别对应的多个图像序列输 入至目标检测模型,得到每个第三训练图像块的分类结果,并根据所有第三训练图像块的分类结果得到第一训练图片特征向量。将第一训练图片特征向量作为输入变量,第一训练图片的分类类别作为输出变量,训练XGBoost模型,得到图片水印识别模型。
在上述技术方案的基础上,根据多个第三训练图像块的分类结果得到第一训练图片特征向量,可以包括:根据每张第一训练图片的每个第三训练图像块的多个分类结果确定所述每个第三训练图像块的水印分类结果。根据每张第一训练图片的所有第三训练图像块的水印分类结果,得到所述每张第一训练图片对应的特征向量。
在上述技术方案的基础上,根据每张第一训练图片的所有第三训练图像块的水印分类结果,得到所述每张第一训练图片对应的特征向量,可以包括:根据每张第一训练图片的多个第三训练图像块的包含水印的概率,对所述多个第三训练图像块水印分类结果进行排序。从每张第一训练图片对应的排序结果中确定所述每张第一训练图片特征向量。
图4为本申请实施例提供的另一种视频水印的识别方法的流程图,本实施例可适用于提高视频水印的识别精度的情况,该方法可以由视频水印的识别装置来执行,该装置可以采用软件和/或硬件的方式实现,该装置可以配置于设备中,例如配置于计算机或移动终端等中。如图4所示,该方法包括如下步骤:
步骤210、将视频的多个视频帧中的每个视频帧划分为多个图像块,得到每个视频帧对应的图像序列。
步骤220、将所述多个视频帧分别对应的多个图像序列输入至目标检测模型,得到每个图像块的分类结果。
步骤230、根据每个视频帧的每个图像块的分类结果确定所述每个图像块的水印分类结果。
一实施例中,水印分类结果包括包含水印的概率。
步骤240、根据每个视频帧的多个图像块的包含水印的概率,对所述每个视频帧的多个图像块的水印分类结果进行排序。
步骤250、从每个视频帧对应的排序结果中确定所述每个视频帧的特征向量。
步骤260、根据所述多个视频帧分别对应的多个特征向量得到视频特征向量。
步骤270、将视频特征向量输入至水印识别模型,得到所述水印识别模型输出的水印识别概率。
步骤280、判断水印识别概率是否大于或等于概率阈值;若水印识别概率大 于或等于概率阈值,则执行步骤290;若水印识别概率小于概率阈值,则执行步骤2100。
步骤290、确定视频包含水印。
步骤2100、确定视频不包含水印。
在本申请的实施,为了理解本申请实施例所提供的技术方案,下面以示例进行说明。
如图5所示,给出了一种视频水印的识别方法的应用示意图。本实施例中,设定概率阈值为0.8。图5中视频包括两个视频帧,将每个视频帧划分为4个图像块,得到每个视频帧的图像序列,并将两个图像序列输入至目标检测模型中,得到每个图像块的分类结果,每个图像块有三个分类结果,分类结果中“1”表示水印,“0”表示背景,分类结果中分类概率用0-1之间的数值表示,根据每个视频帧的4个图像块的包含水印的概率,对4个图像块的水印分类结果按降序方式进行排序,从排序结果中选择前3个水印分类结果确定每个视频帧的特征向量,根据两个视频帧的特征向量得到视频特征向量,再将视频特征向量输入至水印识别模型,得到水印识别概率为0.9,由于水印识别概率大于概率阈值,则确定该视频包含水印。
本实施例的技术方案,通过将视频的多个视频帧中的每个视频帧划分为多个图像块,得到每个视频帧对应的图像序列,将所述多个视频帧分别对应的多个图像序列输入至目标检测模型,得到每个图像块的分类结果,并根据所有图像块的分类结果得到视频特征向量,将视频特征向量输入至水印识别模型,得到所述水印识别模型输出的水印识别概率,在所述水印识别概率大于或等于概率阈值的情况下,确定视频包含水印,上述通过将视频帧分块处理得到图像块,增大了水印所占视频帧的比例,降低了视频水印的识别难度,并通过将多个视频帧分别对应的多个图像序列输入至目标检测模型中,得到多个视频帧对应的水印识别结果,再将多个视频帧对应的水印识别结果输入至水印识别模型中得到视频的水印识别结果,由于利用了多个视频帧对应的水印识别结果,因此,当水印位置不固定时,仍可准确识别视频中是否包含水印,从而提高了视频水印的识别精度。
图6为本申请实施例提供的另一种视频水印的识别方法的流程图,本实施例可适用于提高视频水印的识别精度的情况,该方法可以由视频水印的识别装置来执行,该装置可以采用软件和/或硬件的方式实现,该装置可以配置于设备中,例如配置于计算机或移动终端等中。如图6所示,该方法包括如下步骤:
步骤3010、将视频的多个视频帧中的每个视频帧划分为多个图像块,得到 每个视频帧对应的图像序列。
步骤3020、将所述多个视频帧对应的多个图像序列输入至目标检测模型,得到每个图像块的分类结果。
步骤3030、根据每个视频帧的每个图像块的分类结果确定所述每个图像块的水印分类结果。
一实施例中,水印分类结果包括包含水印的概率。
步骤3040、根据每个视频帧的多个图像块的包含水印的概率,分别对所述多个图像块的水印分类结果进行排序。
步骤3050、从每个视频帧对应的多个排序结果中选取部分水印分类结果进行排序。
步骤3060、从每个视频帧对应的排序结果后的部分水印分类结果中确定所述每个视频帧的特征向量。
步骤3070、根据所述多个视频帧分别对应的多个特征向量得到视频特征向量。
步骤3080、将视频特征向量输入至水印识别模型,得到所述水印识别模型输出的水印识别概率。
步骤3090、判断水印识别概率是否大于或等于概率阈值;若水印识别概率大于或等于概率阈值,则执行步骤3100;若水印识别概率小于概率阈值,则执行步骤3110。
步骤3100、确定视频包含水印。
步骤3110、确定视频不包含水印。
在本申请的实施例中,在本申请的实施,为了理解本申请实施例所提供的技术方案,下面以示例进行说明。
如图7所示,图7给出了另一种视频水印的识别方法的应用示意图。其中,设定概率阈值为0.8。图7中视频包括两个视频帧,将每个视频帧划分为4个图像块,得到每个视频帧的图像序列,并将两个图像序列输入至目标检测模型中,得到每个图像块的分类结果,每个图像块有三个分类结果,分类结果中“1”表示水印,“0”表示背景,分类结果中分类概率用0-1之间的数值表示,根据每个视频帧的4个图像块的包含水印的概率,分别对4个图像块的水印分类结果按降序方式进行排序,分别从4个排序结果中选取前2个水印分类结果作为目标水印分类结果,再对8个目标水印分类结果按降序方式进行排序,可从8个目标水印分类结果中选取前3个排序结果,根据前3个排序结果确定每个视频 帧的特征向量,根据两个视频帧的特征向量得到视频特征向量,再将视频特征向量输入至水印识别模型,得到水印识别概率为0.8,由于水印识别概率等于概率阈值,则确定该视频包含水印。
本实施例的技术方案,通过将视频的多个视频帧中的每个视频帧划分为多个图像块,得到每个视频帧对应的图像序列,将所述多个视频帧分别对应的多个图像序列输入至目标检测模型,得到每个图像块的分类结果,并根据所有图像块的分类结果得到视频特征向量,将视频特征向量输入至水印识别模型,得到所述水印识别模型输出的水印识别概率,在所述水印识别概率大于或等于概率阈值的情况下,确定视频包含水印,上述通过将视频帧分块处理得到图像块,增大了水印所占视频帧的比例,降低了视频水印的识别难度,并通过将多个视频帧分别对应的多个图像序列输入至目标检测模型中,得到多个视频帧对应的水印识别结果,再将多个视频帧对应的水印识别结果输入至水印识别模型中得到视频的水印识别结果,由于利用了多个视频帧对应的水印识别结果,因此,当水印位置不固定时,仍可准确识别视频中是否包含水印,从而提高了视频水印的识别精度。
图8为本申请实施例提供的一种视频水印的识别装置的结构示意图,本实施例可适用于提高视频水印的识别精度的情况,该装置可以采用软件和/或硬件的方式实现,该装置可以配置于设备中,例如配置于计算机或移动终端等中。如图8所示,该装置包括:
图像序列获取模块410,设置为将视频的多个视频帧中的每个视频帧划分为多个图像块,得到每个视频帧对应的图像序列。
视频特征向量获取模块420,设置为将所述多个视频帧对应的多个图像序列输入至目标检测模型,得到每个图像块的分类结果,并根据所有图像块的分类结果得到视频特征向量。
水印识别结果确定模块430,设置为将视频特征向量输入至水印识别模型,得到所述水印识别模型输出的水印识别概率,在所述水印识别概率大于或等于概率阈值的情况下,确定视频包含水印。
本实施例的技术方案,通过将视频的多个视频帧中的每个视频帧划分为多个图像块,得到每个视频帧对应的图像序列,将所述多个视频帧对应的多个图像序列输入至目标检测模型,得到每个图像块的分类结果,并根据所有图像块的分类结果得到视频特征向量,将视频特征向量输入至水印识别模型,得到所述水印识别模型输出的水印识别概率,在所述水印识别概率大于或等于概率阈值的情况下,确定视频包含水印,上述通过将视频帧分块处理得到图像块,增大了水印所占视频帧的比例,降低了视频水印的识别难度,并通过将多个视频 帧分别对应的多个图像序列输入至目标检测模型中,得到多个视频帧对应的水印识别结果,再将多个视频帧对应的水印识别结果输入至水印识别模型中得到视频的水印识别结果,由于利用了多个视频帧对应的水印识别结果,因此,当水印位置不固定时,仍可准确识别视频中是否包含水印,从而提高了视频水印的识别精度。
图9为本申请实施例提供的一种设备的结构示意图。图9示出了适于用来实现本申请实施方式的示例性设备512的框图。图9显示的设备512仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图9所示,设备512以通用计算设备的形式表现。设备512的组件可以包括但不限于:一个或者多个处理器516,系统存储器528,连接于不同系统组件(包括系统存储器528和处理器516)的总线518。
处理器516通过运行存储在系统存储器528中的程序,从而执行多种功能应用以及数据处理,例如实现本申请实施例所提供的一种视频水印的识别方法,包括:将视频的多个视频帧中的每个视频帧划分为多个图像块,得到每个视频帧对应的图像序列。将所述多个视频帧对应的多个图像序列输入至目标检测模型,得到每个图像块的分类结果,并根据所有图像块的分类结果得到视频特征向量。将视频特征向量输入至水印识别模型,得到所述水印识别模型输出的水印识别概率,在所述水印识别概率大于或等于概率阈值的情况下,确定视频包含水印。
本申请实施例还提供了一种计算机可读存储介质,存储有计算机程序,该程序被处理器执行时实现如本申请实施例所提供的一种视频水印的识别方法,该方法包括:将视频的多个视频帧中的每个视频帧划分为多个图像块,得到每个视频帧对应的图像序列。将所述多个视频帧对应的多个图像序列输入至目标检测模型,得到每个图像块的分类结果,并根据所有图像块的分类结果得到视频特征向量。将视频特征向量输入至水印识别模型,得到所述水印识别模型输出的水印识别概率,在所述水印识别概率大于或等于概率阈值的情况下,确定视频包含水印。

Claims (13)

  1. 一种视频水印的识别方法,包括:
    将视频的多个视频帧中的每个视频帧划分为多个图像块,得到每个视频帧对应的图像序列;
    将所述多个视频帧对应的多个图像序列输入至目标检测模型,得到每个图像块的分类结果,并根据所有图像块的分类结果得到视频特征向量;将所述视频特征向量输入至水印识别模型,得到所述水印识别模型输出的水印识别概率,在所述水印识别概率大于或等于概率阈值的情况下,确定所述视频包含水印。
  2. 根据权利要求1所述的方法,其中,每个图像块的分类结果为多个;
    所述根据所有图像块的分类结果得到视频特征向量,包括:
    根据每个视频帧的每个图像块的多个分类结果确定所述每个图像块的水印分类结果;
    根据每个视频帧的所有图像块的水印分类结果,得到所述每个视频帧对应的特征向量;
    根据所述多个视频帧对应的多个特征向量得到视频特征向量。
  3. 根据权利要求2所述的方法,其中,所述水印分类结果包括包含水印的概率;
    所述根据每个视频帧的所有图像块的水印分类结果,得到所述每个视频帧对应的特征向量,包括:
    根据每个视频帧的多个图像块的包含水印的概率,对所述每个视频帧的多个图像块的水印分类结果进行排序;
    从每个视频帧对应的排序结果中确定所述每个视频帧的特征向量。
  4. 根据权利要求2所述的方法,其中,所述水印分类结果包括包含水印的概率;
    所述根据每个视频帧的所有图像块的水印分类结果,得到每个视频帧对应的特征向量,包括:在每个视频帧的一个图像块的水印分类结果为多个的情况下,根据所述每个视频帧的所述一个图像块的多个包含水印的概率,对所述一个图像块的多个水印分类结果进行排序;根据排序结果从所述每个视频帧的多个图像块的水印分类结果中选取部分水印分类结果进行排序;
    从每个视频帧对应的排序后的部分水印分类结果中确定所述每个视频帧的特征向量。
  5. 根据权利要求1-4任一所述的方法,其中,通过如下方式训练所述目标 检测模型:
    获取第一训练样本,所述第一训练样本包括多张训练图片、所述多张训练图片的分类类别和所述多张训练图片的位置信息;
    将所述多张训练图片中的每张训练图片划分为多个第一训练图像块,并根据每张训练图片的分类类别得到所述每张训练图片的每个第一训练图像块的分类类别以及根据每张训练图片的位置信息得到所述每张训练图片的每个第一训练图像块的位置信息;
    将所有第一训练图像块作为输入变量,所述所有第一训图像块的分类类别和位置信息作为输出变量,训练分类器模型,得到所述目标检测模型。
  6. 根据权利要求5所述的方法,其中,所述将所有第一训练图像块作为输入变量,所述所有第一训练图像块的分类类别和位置信息作为输出变量,训练分类器模型,得到所述目标检测模型,包括:
    获取每个第一训练图像块的尺寸信息;
    对所有第一训练图像块的尺寸信息进行聚类分析,确定每个第一训练图像块的先验框;
    将所述所有第一训练图像块和所述所有第一训练图像块的先验框作为输入变量,所述所有第一训练图像块的分类类别和位置信息作为输出变量,训练分类器模型,得到所述目标检测模型。
  7. 根据权利要求1-6任一所述的方法,其中,通过如下方式训练所述水印识别模型:
    获取第二训练样本,所述第二训练样本包括训练视频和所述训练视频的分类类别;
    将所述训练视频的多个训练视频帧中的每个训练视频帧划分为多个第二训练图像块,得到所述每个训练视频帧对应的图像序列;
    将所述多个训练视频帧对应的多个图像序列输入至所述目标检测模型,得到每个第二训练图像块的分类结果,并根据所有第二训练图像块的分类结果得到训练视频特征向量;
    将所述训练视频特征向量作为输入变量,所述训练视频的分类类别作为输出变量,训练极限梯度提升Xgboost模型,得到所述水印识别模型。
  8. 根据权利要求7所述的方法,其中,每个第二训练图像块的分类结果为多个;
    所述根据所有第二训练图像块的分类结果得到训练视频特征向量,包括:
    根据每个训练视频帧的每个第二训练图像块的多个分类结果确定所述每个第二训练图像块的水印分类结果;
    根据每个训练视频帧的所有第二训练图像块的水印分类结果,得到所述每个训练视频帧对应的特征向量;
    根据所述多个训练视频帧对应的多个特征向量得到训练视频特征向量。
  9. 根据权利要求8所述的方法,其中,所述水印分类结果包括包含水印的概率;
    所述根据每个训练视频帧的所有第二训练图像块的水印分类结果,得到所述每个训练视频帧对应的特征向量,包括:
    根据每个训练视频帧的多个第二训练图像块的包含水印的概率,对所述每个训练视频帧的多个第二训练图像块的水印分类结果进行排序;
    从每个训练视频帧对应的排序结果中确定所述每个训练视频帧的特征向量。
  10. 根据权利要求8所述的方法,其中,所述水印分类结果包括包含水印的概率;
    所述根据每个训练视频帧的所有第二训练图像块的水印分类结果,得到所述每个训练视频帧对应的特征向量,包括:
    在每个训练视频帧的一个第二训练图像块的水印分类结果为多个的情况下,根据所述每个训练视频帧的所述一个第二训练图像块的多个包含水印的概率,对所述一个第二训练图像块的多个水印分类结果进行排序;根据排序结果从所述每个训练视频帧的多个第二训练图像块的水印分类结果中选取部分水印分类结果进行排序;
    从每个训练视频帧对应的排序后的部分水印分类结果中确定所述每个训练视频帧的特征向量。
  11. 一种视频水印的识别装置,包括:
    图像序列获取模块,设置为将视频的多个视频帧中的每个视频帧划分为多个图像块,得到每个视频帧对应的图像序列;
    视频特征向量获取模块,设置为将所述多个视频帧对应的多个图像序列输入至目标检测模型,得到每个图像块的分类结果,并根据所有图像块的分类结果得到视频特征向量;
    水印识别结果确定模块,设置为将所述视频特征向量输入至水印识别模型,得到所述水印识别模型输出的水印识别概率,在所述水印识别概率大于或等于概率阈值的情况下,确定所述视频包含水印。
  12. 一种设备,包括:
    至少一个处理器;
    存储器,设置为存储至少一个程序;
    当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如权利要求1-10任一所述的方法。
  13. 一种计算机可读存储介质,存储有计算机程序,所述程序被处理器执行时实现如权利要求1-10任一所述的方法。
PCT/CN2019/122609 2018-12-03 2019-12-03 视频水印的识别方法、装置、设备及存储介质 WO2020114378A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/299,726 US11631248B2 (en) 2018-12-03 2019-12-03 Video watermark identification method and apparatus, device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811465129.4A CN109598231B (zh) 2018-12-03 2018-12-03 一种视频水印的识别方法、装置、设备及存储介质
CN201811465129.4 2018-12-03

Publications (1)

Publication Number Publication Date
WO2020114378A1 true WO2020114378A1 (zh) 2020-06-11

Family

ID=65959363

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/122609 WO2020114378A1 (zh) 2018-12-03 2019-12-03 视频水印的识别方法、装置、设备及存储介质

Country Status (3)

Country Link
US (1) US11631248B2 (zh)
CN (1) CN109598231B (zh)
WO (1) WO2020114378A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114511757A (zh) * 2022-01-27 2022-05-17 北京百度网讯科技有限公司 用于训练图像检测模型的方法和装置

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109598231B (zh) 2018-12-03 2021-03-02 广州市百果园信息技术有限公司 一种视频水印的识别方法、装置、设备及存储介质
CN111815499A (zh) * 2019-04-11 2020-10-23 珠海金山办公软件有限公司 一种水印删除方法及装置
CN111914831B (zh) * 2019-05-10 2023-06-02 杭州海康威视数字技术股份有限公司 目标检测方法、装置及存储介质
CN112017092A (zh) * 2019-05-30 2020-12-01 阿里巴巴集团控股有限公司 水印检测模型的生成和水印检测方法、装置及设备
CN110349070B (zh) * 2019-06-12 2022-12-16 杭州小影创新科技股份有限公司 一种短视频水印检测方法
CN110287350A (zh) * 2019-06-29 2019-09-27 北京字节跳动网络技术有限公司 图像检索方法、装置及电子设备
US11983853B1 (en) * 2019-10-31 2024-05-14 Meta Plattforms, Inc. Techniques for generating training data for machine learning enabled image enhancement
CN110991488B (zh) * 2019-11-08 2023-10-20 广州坚和网络科技有限公司 一种使用深度学习模型的图片水印识别方法
CN110798750B (zh) * 2019-11-29 2021-06-29 广州市百果园信息技术有限公司 视频水印去除方法、视频数据发布方法及相关装置
CN111339944A (zh) * 2020-02-26 2020-06-26 广东三维家信息科技有限公司 装修风格识别方法、装置及电子设备
CN111340677B (zh) * 2020-02-27 2023-10-27 北京百度网讯科技有限公司 视频水印检测方法、装置、电子设备、计算机可读介质
CN111861849B (zh) * 2020-07-15 2023-04-07 上海交通大学 向人工智能模型植入水印信息的方法
CN112927122A (zh) * 2021-04-14 2021-06-08 北京小米移动软件有限公司 水印去除方法、装置及存储介质
US11575978B2 (en) * 2021-06-01 2023-02-07 Western Digital Technologies, Inc. Data storage device and method for reliable watermarking

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104504642A (zh) * 2014-12-17 2015-04-08 北京齐尔布莱特科技有限公司 一种在图片中添加水印的方法、装置和计算设备
CN106096668A (zh) * 2016-08-18 2016-11-09 携程计算机技术(上海)有限公司 带水印图像的识别方法及识别系统
CN106331746A (zh) * 2016-09-19 2017-01-11 北京小度互娱科技有限公司 用于识别视频文件中的水印位置的方法和装置
CN107808358A (zh) * 2017-11-13 2018-03-16 携程计算机技术(上海)有限公司 图像水印自动检测方法
CN109598231A (zh) * 2018-12-03 2019-04-09 广州市百果园信息技术有限公司 一种视频水印的识别方法、装置、设备及存储介质

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020009208A1 (en) * 1995-08-09 2002-01-24 Adnan Alattar Authentication of physical and electronic media objects using digital watermarks
US7720249B2 (en) * 1993-11-18 2010-05-18 Digimarc Corporation Watermark embedder and reader
US6590996B1 (en) * 2000-02-14 2003-07-08 Digimarc Corporation Color adaptive watermarking
CN102402542A (zh) * 2010-09-14 2012-04-04 腾讯科技(深圳)有限公司 一种视频标签方法及系统
US11288472B2 (en) * 2011-08-30 2022-03-29 Digimarc Corporation Cart-based shopping arrangements employing probabilistic item identification
US9892301B1 (en) * 2015-03-05 2018-02-13 Digimarc Corporation Localization of machine-readable indicia in digital capture systems
US10887362B2 (en) * 2017-04-10 2021-01-05 Box, Inc. Forensic watermarking of shared video content
CN107995500B (zh) * 2017-10-27 2019-01-01 北京达佳互联信息技术有限公司 视频水印识别方法、装置及终端
CN108650491B (zh) * 2018-05-15 2020-07-07 西安电子科技大学 一种面向监控系统的视频水印检测方法
CN108833974B (zh) * 2018-06-29 2021-05-18 北京奇虎科技有限公司 去除视频中半透明水印的方法、装置和电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104504642A (zh) * 2014-12-17 2015-04-08 北京齐尔布莱特科技有限公司 一种在图片中添加水印的方法、装置和计算设备
CN106096668A (zh) * 2016-08-18 2016-11-09 携程计算机技术(上海)有限公司 带水印图像的识别方法及识别系统
CN106331746A (zh) * 2016-09-19 2017-01-11 北京小度互娱科技有限公司 用于识别视频文件中的水印位置的方法和装置
CN107808358A (zh) * 2017-11-13 2018-03-16 携程计算机技术(上海)有限公司 图像水印自动检测方法
CN109598231A (zh) * 2018-12-03 2019-04-09 广州市百果园信息技术有限公司 一种视频水印的识别方法、装置、设备及存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114511757A (zh) * 2022-01-27 2022-05-17 北京百度网讯科技有限公司 用于训练图像检测模型的方法和装置

Also Published As

Publication number Publication date
CN109598231A (zh) 2019-04-09
CN109598231B (zh) 2021-03-02
US11631248B2 (en) 2023-04-18
US20220019805A1 (en) 2022-01-20

Similar Documents

Publication Publication Date Title
WO2020114378A1 (zh) 视频水印的识别方法、装置、设备及存储介质
WO2020221278A1 (zh) 视频分类方法及其模型的训练方法、装置和电子设备
CN111754596B (zh) 编辑模型生成、人脸图像编辑方法、装置、设备及介质
CN112150821B (zh) 轻量化车辆检测模型构建方法、系统及装置
CN108132968A (zh) 网络文本与图像中关联语义基元的弱监督学习方法
CN112308862A (zh) 图像语义分割模型训练、分割方法、装置以及存储介质
US20110293173A1 (en) Object Detection Using Combinations of Relational Features in Images
CN109903339B (zh) 一种基于多维融合特征的视频群体人物定位检测方法
CN114998602B (zh) 基于低置信度样本对比损失的域适应学习方法及系统
CN111898704B (zh) 对内容样本进行聚类的方法和装置
WO2023088174A1 (zh) 目标检测方法及装置
CN114842343A (zh) 一种基于ViT的航空图像识别方法
CN113569895A (zh) 图像处理模型训练方法、处理方法、装置、设备及介质
CN112749737A (zh) 图像分类方法及装置、电子设备、存储介质
Franchi et al. Latent discriminant deterministic uncertainty
Blot et al. Shade: Information-based regularization for deep learning
Guo et al. Multi-view feature learning for VHR remote sensing image classification
CN111191781A (zh) 训练神经网络的方法、对象识别方法和设备以及介质
Everett et al. ProtoCaps: A Fast and Non-Iterative Capsule Network Routing Method
CN111858999B (zh) 一种基于分段困难样本生成的检索方法及装置
Pryor et al. Deepfake Detection Analyzing Hybrid Dataset Utilizing CNN and SVM
CN114241227A (zh) 一种基于vlad的图像识别方法及装置
CN114241380A (zh) 一种基于类别标签和属性注释的多任务属性场景识别方法
CN114972725A (zh) 模型训练方法、可读介质和电子设备
CN114092849A (zh) 类器模型确定、检测方法及装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19891862

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19891862

Country of ref document: EP

Kind code of ref document: A1