CN112419132B

CN112419132B - Video watermark detection method, device, electronic equipment and storage medium

Info

Publication number: CN112419132B
Application number: CN202011225956.3A
Authority: CN
Inventors: 陈广; 王雷; 张波; 苏正航
Original assignee: Guangzhou Overseas Shoulder Sub Network Technology Co ltd
Current assignee: Guangzhou Overseas Shoulder Sub Network Technology Co ltd
Priority date: 2020-11-05
Filing date: 2020-11-05
Publication date: 2024-06-18
Anticipated expiration: 2040-11-05
Also published as: CN112419132A

Abstract

The application discloses a video watermark detection method, a device, electronic equipment and a storage medium, wherein the method comprises the following steps: detecting the watermark of the video to be processed by utilizing a pre-trained target detection network to obtain a watermark detection result of each frame of video image in the video; obtaining a target video image of which the watermark is not detected in the video to be processed according to the watermark detection result; judging whether the adjacent video images detect the watermark according to the watermark detection result; when the adjacent video image detects the watermark, obtaining a watermark detection result of the adjacent video image as a watermark detection result of the target video image; and regenerating the watermark detection result of each frame of video image in the video to be processed according to the watermark detection result of the target video image. After the target video image with no watermark detected in the video is obtained, the watermark detection result of the adjacent video image can be used as the watermark detection result of the target video image, so that the condition that the watermark is missed in the video is avoided.

Description

Video watermark detection method, device, electronic equipment and storage medium

Technical Field

The present application relates to the field of video processing technologies, and in particular, to a method and apparatus for detecting a video watermark, an electronic device, and a storage medium.

Background

In the age of rapid development of multimedia technology, a large number of video files are produced. In some video files, producers often watermark the video for advertising or for protecting the copyright of the video, for infringement tracking, etc. However, these watermarked videos tend to degrade the viewing experience of the viewer and it is also undesirable for the video distributor to distribute the video while simultaneously distributing the watermarks of other people as part of the video content. Therefore, in some cases, watermark detection is required for video to remove the watermark from the video when the watermark is detected. However, the existing watermark detection method has low accuracy, which easily causes that the watermark in the video is not completely removed, so that the user has poor watching experience on the video.

Disclosure of Invention

The embodiment of the application provides a video watermark detection method, a video watermark detection device, electronic equipment and a storage medium, which can improve the accuracy of video watermark detection.

In a first aspect, an embodiment of the present application provides a method for detecting a video watermark, where the method includes: detecting the watermark in the video to be processed by utilizing a pre-trained target detection network, and obtaining a watermark detection result of each frame of video image in the video to be processed; obtaining a target video image in which the watermark is not detected in the video to be processed according to the watermark detection result; judging whether a watermark is detected in an adjacent video image according to the watermark detection result, wherein the adjacent video image is a video frame image adjacent to the target video image in the video to be processed; when the adjacent video image detects a watermark, acquiring a watermark detection result of the adjacent video image as a watermark detection result of the target video image; and regenerating the watermark detection result of each frame of video image in the video to be processed according to the watermark detection result of the target video image.

In a second aspect, an embodiment of the present application provides a video watermark detection apparatus, including: the target detection module is used for detecting the watermark of the video to be processed by utilizing a pre-trained target detection network and obtaining a watermark detection result of each frame of video image in the video to be processed; the target acquisition module is used for acquiring a target video image in which the watermark is not detected in the video to be processed according to the watermark detection result; the adjacent judging module is used for judging whether the adjacent video image detects the watermark according to the watermark detection result, wherein the adjacent video image is a video frame image adjacent to the target video image in the video to be processed; the result copying module is used for acquiring a watermark detection result of the adjacent video image when the adjacent video image detects the watermark, and taking the watermark detection result as the watermark detection result of the target video image; and the result generation module is used for regenerating the watermark detection result of each frame of video image in the video to be processed according to the watermark detection result of the target video image.

In a third aspect, an embodiment of the present application provides an electronic device, including: a memory; one or more processors coupled with the memory; one or more applications, wherein the one or more applications are stored in memory and configured to be executed by the one or more processors, the one or more applications configured to perform the video watermark detection method provided in the first aspect above.

In a fourth aspect, an embodiment of the present application provides a computer readable storage medium, where a program code is stored, where the program code is capable of being invoked by a processor to perform the video watermark detection method provided in the first aspect.

The embodiment of the application provides a video watermark detection method, a device, electronic equipment and a storage medium, which are used for detecting a watermark in a video to be processed by utilizing a pre-trained target detection network to obtain a watermark detection result of each frame of video image in the video to be processed; obtaining a target video image in which the watermark is not detected in the video to be processed according to the watermark detection result; judging whether a watermark is detected in an adjacent video image according to the watermark detection result, wherein the adjacent video image is a video frame image adjacent to the target video image in the video to be processed; when the adjacent video image detects a watermark, acquiring a watermark detection result of the adjacent video image as a watermark detection result of the target video image; and regenerating the watermark detection result of each frame of video image in the video to be processed according to the watermark detection result of the target video image. Therefore, the watermark in the video is detected through the pre-trained target detection network, the watermark position in the video can be automatically positioned without manually searching a watermark area, when the watermark detection result of the video obtained through the target detection network shows that the target video image with no watermark is detected, the watermark detection result of the adjacent frame video image of the target video image can be used as the watermark detection result of the target video image by utilizing the continuity of the video image frame and the small variability of the watermark position in the adjacent frame video image, so that the problem of watermark omission in a certain frame or a plurality of frames of video images in the video can be avoided, and the accuracy of video watermark detection is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a video watermark detection method according to an embodiment of the present application.

Fig. 2 is a schematic flow chart of a video watermark detection method according to another embodiment of the application.

Fig. 3 is a schematic flow chart of another video watermark detection method according to another embodiment of the application.

Fig. 4 shows a watermark sample schematic diagram of the video watermark detection method provided by the application.

Fig. 5 is a flowchart of step S220 in the video watermark detection method according to the embodiment of the application.

Fig. 6 is a schematic flow chart of a video watermark detection method according to another embodiment of the application.

Fig. 7 is a schematic flow chart of a video watermark detection method according to still another embodiment of the present application.

Fig. 8 is a schematic overall flow chart of a video watermark detection method according to an embodiment of the application.

Fig. 9 shows a schematic diagram of watermark detection effect in a video watermark detection method according to an embodiment of the application.

Fig. 10 shows schematic diagrams of effects before and after watermark repair in a video watermark detection method according to an embodiment of the application.

Fig. 11 shows a block diagram of a video watermark detection device according to an embodiment of the present application.

Fig. 12 shows a block diagram of an electronic device according to an embodiment of the present application.

Fig. 13 shows a memory unit for storing or carrying program codes for implementing a video watermark detection method according to an embodiment of the application.

Detailed Description

In order to enable those skilled in the art to better understand the present application, the following description will make clear and complete descriptions of the technical solutions according to the embodiments of the present application with reference to the accompanying drawings.

Current watermark detection methods, typically fixed-position watermark detection, typically require manual pre-assignment of watermark regions. For example, for a watermark fixed in the lower right corner in a video, the watermark region may be manually pre-designated as the lower right corner region of the video, so that watermark detection and watermark removal may be directly performed for the lower right corner region. However, for the video with the complex problems of watermark position change, watermark form change and the like, the watermark detection method cannot effectively detect the watermark, and miss detection is easy to occur, so that the accuracy of watermark detection is not high.

In order to solve the above-mentioned drawbacks, embodiments of the present application provide a method, apparatus, system, electronic device, and storage medium for detecting video watermarks. The watermark detection method can realize automatic positioning of the watermark in the video through a pre-trained target detection network, improve the accuracy of watermark detection, and can copy watermark detection results of adjacent images to target images without watermark detection through inter-frame smoothing processing, so that the problem that the watermark detection of one or more frames of video is missed due to the influence of the detection recall rate of the target detection network is solved, and the accuracy of video watermark detection is further improved. The following will explain in detail specific examples.

Referring to fig. 1, fig. 1 shows a flowchart of a video watermark detection method provided by an embodiment of the present application, which may be applied to an electronic device, where the video watermark detection method may include:

Step S110: and detecting the watermark in the video to be processed by using a pre-trained target detection network, and obtaining a watermark detection result of each frame of video image in the video to be processed.

The video to be processed may be a video to be subjected to watermark detection processing, and the source and format of the video are not limited, but are not listed here. Such as may be available from local or internet downloads. In some embodiments, the video to be processed may be video without added watermark, or video with one or more watermarks added. When the watermark is added to the video, the watermark can be added to each frame of video image, or the watermark can be added to the image in a certain period of time of the video, and the added watermark can be a watermark with fixed quantity, shape, size, position and posture (such as rotation angle and the like), or a dynamic watermark with unfixed quantity, shape, size, position and posture (such as a 'jittering' watermark of a short video platform). The specific video type and the duration, type and number of watermarks added to the video are not limited in the embodiments of the present application.

In the embodiment of the application, the video to be processed can be input into a pre-trained target detection network so as to carry out watermark detection on each frame of video image in the video through the target detection network, and then after the target detection network outputs the watermark detection result of each frame of video image, the electronic equipment can acquire the watermark detection result of each frame of video image in the video to be processed.

The target detection network may detect the specified target by applying a deep learning algorithm, i.e. a target detection algorithm, which in the embodiment of the present application is a watermark. The target detection algorithm may be, but not limited to, a target detection algorithm based on an image segmentation technique, a target detection algorithm based on image feature matching, a frequency domain-based method, and the like.

In some embodiments, the target detection network may be a two-stage detection network, such as R-CNN (Region-based Convolutional Neural Networks, region-based convolutional neural network), fast R-CNN (Fast Region-based Convolutional Neural Networks, fast Region-based convolutional neural network), fast R-CNN (Fast Region-based Convolutional Neural Networks, faster Region-based convolutional neural network), and the like, where the two-stage detection network generates a large number of candidate regions using RPN (Region Proposal Network, region selection network), SELECTIVE SEARCH (selective search), and the like, and generally has relatively high detection accuracy. In other embodiments, the target detection network may also be a one-stage detection network, such as RETINANET network, SSD (Single Shot MultiBox Detector, single-shot multi-box detector), YOLO (You Only Look Once), etc., where the one-stage detection network may directly generate the target area on the basis of the multi-scale anchor, which often has a faster detection speed, etc., and the embodiment is not limited thereto. For example, the object detection network of the present application may be RETINANET networks with relatively fast detection speeds.

In some embodiments, the pre-trained target detection network may be obtained by training the neural network model in advance according to a plurality of first training samples. The first training sample may include a first image sample and a watermark marking sample corresponding to the first image sample, where the watermark marking sample may include specific location information of a watermark in the first image sample. Therefore, the watermark of each frame of video image in the video to be processed can be identified and detected according to the pre-trained target detection network, namely the pre-trained target detection network can be used for outputting a watermark detection result corresponding to the video image according to the input video image. The watermark detection result may include position information of the watermark in the video image, which may be presented in a form of a detection frame or a form of a coordinate value, which is not limited herein. In some embodiments, the first image sample in the first training sample may be an image of the watermark at a different location to ensure the watermark location recognition effect of the target detection network.

Step S120: and acquiring a target video image in which the watermark is not detected in the video to be processed according to the watermark detection result.

Since the target detection network has a recall (i.e., the number of images detected with watermarks is a percentage of the total number of images actually with watermarks, and the ideal recall is 100%, indicating no omission), when the target detection network is utilized to detect the watermark of each frame of video image in the video to be processed, there may be one or more frames of video images for which the watermark is omitted. For example, if there are 500 frames of video images in total for a 20s video, when each frame of video image has watermarks to be removed and the recall rate of the target detection network reaches 99%, there is a high probability that the average video segment will have watermarks in 5 frames that are missed. For the played video, even if only one frame of video image in the video has watermark, the user is provided with bad look and feel. Therefore, in the embodiment of the application, in order to prevent the watermark of a certain frame or a plurality of frames from being missed, an inter-frame smoothing process flow can be added after the watermark detection result of the video output by the target detection network is obtained, so that each frame of video image can be ensured to detect the watermark, the recall rate of the target detection network is improved, and the watermark in the video is prevented from being missed. The inter-frame smoothing process may be understood as determining a watermark detection result of the target image by using watermark detection results of images adjacent to the target image.

Specifically, in the embodiment of the present application, after obtaining the watermark detection result of each frame of video image in the video to be processed output by the target detection network, a video image in the video to be processed, in which the watermark is not detected, may be obtained as the target video image in the above-mentioned inter-frame smoothing process flow. That is, in the embodiment of the present application, not any video image may enter the above-mentioned inter-frame smoothing process flow, but only a video image in which a watermark is not detected may enter the above-mentioned inter-frame smoothing process flow, so that the execution steps of the processor are reduced, and the watermark detection efficiency is improved.

It will be appreciated that when each frame of video image in the video to be processed detects a watermark, the object detection network may be considered as not missing, and the inter-frame smoothing process of the present application may not be performed. When there is a target video image in the video to be processed, the target video image is likely to be an image missed by the target detection network, the inter-frame smoothing process flow of the application can be executed.

In some embodiments, when the watermark is not detected by the target detection network, a preset detection result may be output, so that whether the frame video image is a target video image in which the watermark is not detected may be determined by respectively determining whether the watermark detection result of each frame video image is the preset detection result. The preset detection result may be a watermark position information set with null elements, or may be preset characters such as "none", "watermark is not detected", which is not limited herein.

Step S130: and judging whether the adjacent video image detects the watermark according to the watermark detection result, wherein the adjacent video image is a video frame image adjacent to the target video image in the video to be processed.

In some scenarios, since the watermark is present for a limited time, it may not be possible for the video to have the watermark from beginning to end, and thus, when there is a target video image in the video to be processed where the watermark is not detected, the target video image may indeed be free of the watermark, not because of a missed detection of the target detection network. Therefore, in the embodiment of the application, after the target video image in which the watermark is not detected in the video to be processed is obtained, whether the watermark is detected in the adjacent video image is judged, and the adjacent video image is a video frame image adjacent to the target video image in the video to be processed, so that whether the target video image is an image missed by the target detection network can be determined according to the watermark detection result of the adjacent video image. The adjacent video image may be one frame of video image adjacent to the target video image, or may be multiple frames of video image adjacent to the target video image, which is not limited herein.

It will be appreciated that if the watermark is not detected in the adjacent video image either, it is likely that the watermark does not exist in the target video image, and that it is likely that the target detection network is not missing; if the adjacent video image detects the watermark, the watermark is likely to exist in the target video image, and the watermark is likely to be missed by the target detection network. Therefore, whether the target video image is an image which is missed by the target detection network can be determined by judging whether the adjacent video image detects the watermark, so that the target video image which needs to enter the inter-frame smoothing process can be further accurately identified, unnecessary operations of other video images are reduced, and the watermark detection efficiency is improved.

In some embodiments, since each frame of video image in the video to be processed is sequenced according to time sequence, an adjacent video image corresponding to a time node adjacent to the target time node can be determined according to the target time node of the target video image in the video to be processed, so that a watermark detection result of the adjacent video image can be obtained to determine whether the watermark is detected. In other embodiments, when the target detection network detects the watermark in the video to be processed, it is likely that each frame of video image in the video to be processed is input into the target detection network according to the time sequence to perform watermark detection, so that the watermark detection results adjacent to the watermark detection results of the target detection network and the target video image before and after each other can also be obtained directly according to the detection sequence of the target video image in the target detection network, and used as the watermark detection results of the adjacent video images to perform the confirmation of whether the watermark is detected. The description of whether the watermark is detected or not according to the watermark detection result may refer to the foregoing description of the watermark detection result, and will not be repeated here.

Step S140: and when the adjacent video image detects the watermark, acquiring a watermark detection result of the adjacent video image as the watermark detection result of the target video image.

In the embodiment of the application, when the adjacent video image adjacent to the target video image detects the watermark and the target video image does not detect the watermark, the watermark is considered to be likely to exist in the target video image, and the target video image is likely to be missed by the target detection network, so that the target video image can enter an inter-frame smoothing processing flow. Specifically, the continuity of video image frames and the small variability of watermark positions in video images of adjacent frames can be utilized to directly obtain the watermark detection result of the adjacent video images adjacent to the target video image as the watermark detection result of the target video image, so that the watermark detection result with higher accuracy can be provided for the target video image which is missed by the target detection network, the target detection network does not need to be input again for re-detection, and the watermark detection efficiency is improved.

In some embodiments, when a watermark is not detected in a target video image, and a watermark is not detected in an adjacent video image adjacent to the target video image, it may be considered that the watermark does not exist in the target video image, and the target video image is likely not missed by the target detection network. Therefore, it is not necessary to enter the inter-frame smoothing processing flow, that is, to execute the above-described step of acquiring the watermark detection result of the adjacent video image adjacent to the target video image as the watermark detection result of the target video image.

Step S150: and regenerating the watermark detection result of each frame of video image in the video to be processed according to the watermark detection result of the target video image.

In the embodiment of the application, after the watermark detection result of the adjacent video image is used as the watermark detection result of the target video image, the watermark detection result of each frame of video image in the video to be processed can be regenerated according to the watermark detection result of the target video image, so that the watermark detection result of the video to be processed is updated, and the subsequent processing is convenient. For example, according to the updated watermark detection result of the video to be processed, removing the watermark in the video to be processed, and the like.

In some embodiments, watermark detection results of each frame of video image in the video to be processed may be stored in a unified manner according to a certain rule set, such as storing in a matrix or a collection form, so that watermark detection results of the adjacent video image may be directly copied, and the original watermark detection results of the target video image may be replaced by the copied watermark detection results, so that watermark detection results of each frame of video image in the new video to be processed may be regenerated. The particular watermark detection result updating manner of the video to be processed is not determined here.

According to the video watermark detection method provided by the embodiment of the application, a pre-trained target detection network is utilized to detect the watermark in the video to be processed, and the watermark detection result of each frame of video image in the video to be processed is obtained; obtaining a target video image in which the watermark is not detected in the video to be processed according to the watermark detection result; judging whether a watermark is detected in an adjacent video image according to the watermark detection result, wherein the adjacent video image is a video frame image adjacent to the target video image in the video to be processed; when the adjacent video image detects a watermark, acquiring a watermark detection result of the adjacent video image as a watermark detection result of the target video image; and regenerating the watermark detection result of each frame of video image in the video to be processed according to the watermark detection result of the target video image. The application uses the continuity of video image frames and the smaller variability of watermark positions in adjacent frame video images, can avoid the problem of watermark omission in a certain frame or multiple frames of video images in the video to be processed by taking the watermark detection result of the adjacent frame video images of the target video image as the watermark detection result of the target video image when the target video image does not detect the watermark, and improves the accuracy of video watermark detection.

Referring to fig. 2, fig. 2 is a flow chart illustrating a video watermark detection method according to another embodiment of the application, which can be applied to an electronic device, and the video watermark detection method can include:

step S210: and carrying out framing treatment on the video to be treated to obtain a video image sequence.

Because the video is formed by splicing video images of one frame and another frame according to time sequence, in some embodiments, after the video to be processed needing watermark detection is obtained, the video to be processed can be subjected to frame division processing through various existing video frame cutting decomposition software so as to obtain a complete video image sequence of the video to be processed. The video image sequence is understood to mean that, after the video has been decomposed into a plurality of video images, a set of consecutive video image frames { V _t |t=1..n } is generated, which has a chronological order.

For example, a1 minute long 25FPS video to be processed is decomposed into 1500 video image frames (1 minute x 60 seconds/minute x frame per second transmission frame number). Among them, FPS (FRAMES PER seconds, transmission frame number per Second) can be understood as a frame number per Second of transmission of pictures. It should be noted that, when the video to be processed is subjected to framing processing, the number of frames of the video per second (i.e., the sampling rate, such as the number of frames transmitted per second for the above-mentioned picture) cannot be smaller than the number of frames transmitted per second, so as to ensure that all watermarks in the video to be processed are detected as much as possible.

Step S220: and detecting the watermark in the video image sequence by using a pre-trained target detection network, and obtaining a watermark detection result of each frame of video image in the video image sequence.

In some embodiments, after obtaining a video image sequence after the video frame processing to be processed, the video image sequence may be input to a pre-trained target detection network, so as to perform watermark detection on each frame of video image in the video image sequence through the target detection network, and then after the target detection network outputs the watermark detection result of each frame of video image, the electronic device may obtain the watermark detection result of each frame of video image in the video image sequence. The target detection network is obtained by training the neural network in advance according to a first training sample, wherein the first training sample comprises a first image sample and a watermark marking sample corresponding to the first image sample.

In some embodiments, in order to ensure generalization performance of the target detection network in a real and complex short video application scene, the first image sample may be an image of the watermark at different positions, so as to ensure watermark positioning and identification effect of the target detection network.

As one approach, the first image sample may be an image of a randomly synthesized watermark at a different location. Specifically, referring to fig. 3, before step S220, the video watermark detection method of the present application may further include:

step S200: a plurality of background samples and a plurality of watermark samples are acquired.

In some embodiments, a plurality of background samples and a plurality of watermark samples may be acquired prior to synthesizing the first image sample of the present application. The background sample may be any clean, watermark-free background picture, which may be obtained from various manners such as internet, download, local reading, etc., and is not limited herein. The watermark sample can be any clean and background-free watermark picture, and after the watermark picture with the background is acquired, the watermark picture with the background can be processed into a final clean and background-free watermark sample by using a photo processing tool and the like. For example, referring to FIG. 4, which shows the acquired background-free watermark sample set L _w w=1. In total m watermarks.

In some embodiments, the watermark type in the watermark sample may be preset, so that the trained target detection network may detect only the watermark of the preset watermark type. Therefore, a batch of watermarks to be removed can be determined in advance according to actual service requirements, and targeted watermark detection is realized.

Step S201: and randomly synthesizing the watermark samples and the background samples through a fusion algorithm to obtain a plurality of synthesized first image samples.

In some embodiments, after the acquisition of the plurality of background samples and the plurality of watermark samples, a first set of training samples may be generated for training the object detection network of the present application. As a way, the obtained watermark samples and the background samples may be randomly synthesized by a fusion algorithm, so that the obtained watermark samples without background are randomly synthesized to any position of the background samples, thereby obtaining a plurality of synthesized first image samples.

As one way, the fusion algorithm may be an Alpha-Blending algorithm, so-called Alpha-Blending, which is the fact that the source pixels in the background samples and the target pixels in the watermark samples are mixed according to the values of the "Alpha (used to indicate how the pixels produce a trick effect, i.e. what we say in general)" mixing vector. Specifically, the RGB three color components of the source pixel in the background sample and the target pixel in the watermark sample may be separated, then the three color components of the source pixel are multiplied by the value of Alpha, the three color components of the target pixel are multiplied by the inverse value of Alpha, the results are added according to the corresponding color components, the last obtained result of each component is divided by the maximum value of Alpha, and finally the three color components are recombined into one pixel for output. Namely, the RGB values of the source pixels are mixed with the RGB values of the target pixels in proportion respectively, and finally a mixed RGB value is obtained.

The fusion formula of the fusion algorithm may be: i _merge＝(1-a)I_bg+a*L_w. Where I _bg denotes the pixel value in the background sample, L _w denotes the pixel value in the background-free watermark sample, I _merge denotes the pixel value in the synthesized first image sample, and a denotes the Alpha normalized value, i.e. Alpha/256, where Alpha has a value typically from 0 to 255. Thus, after determining the random fusion position, the pixel values of the background sample and the watermark sample at the fusion position can be substituted into the calculation.

For example, if 30 ten thousand clean background samples and 30 clean watermark samples are obtained, the 30 clean watermark samples may be randomly fused to any position of the 30 ten thousand clean background samples, so as to obtain 30 ten thousand synthesized first image samples. Therefore, 1 watermark sample can be fused with 1 ten thousand background samples, so that enough images of the watermark at different positions can be obtained, and the detection precision of the target detection network is improved.

Step S202: and generating watermark labeling samples corresponding to each first image sample according to the synthesis positions of the watermark samples in each first image sample.

In the embodiment of the application, after a plurality of synthesized first image samples are obtained, watermark labeling samples corresponding to each first image sample can be generated according to the synthesis positions of the watermark samples in each first image sample. The watermark marking sample is used for representing position information of the watermark sample in the first image sample, and can be coordinate values of four corners of the watermark sample, or can be coordinate values of diagonal corners only, such as coordinate values of the upper left corner and the lower right corner of the watermark sample, or can be coordinates of a central point of the watermark sample and a wide-high value, and specific position information is not limited herein, and only the position of the watermark sample needs to be determined.

It can be understood that by taking the first image samples of different watermarks at different positions, which are randomly synthesized, as a training data set, a trained target detection network can be enabled to obtain accurate watermark detection results even in the video to be processed facing the complex problems of watermark position change, watermark form change and the like, so that accurate dynamic and static watermark positioning is realized. .

Alternatively, instead of the above-described randomly synthesized first image samples, a batch of watermarked first image samples may be obtained directly. The method can be that a batch of real videos with preset watermarks are collected, after framing processing is carried out to obtain a video image sequence as a first image sample, watermarking can be carried out on the video image with the watermarks manually, and therefore watermark marking samples corresponding to the first image sample are obtained. In some embodiments, the randomly synthesized first image sample and the directly obtained first image sample may be used together as a training data set of a training target detection network, so as to ensure the authenticity of the detection effect and also have generalization.

Since the watermark detection result output by the target detection network may have a problem of false detection, that is, the content in the possible detection frame is not watermark, in some embodiments, a classification network may be added to further reduce the false detection rate. Specifically, referring to fig. 5, after step S220, the video watermark detection method of the present application may further include:

Step S221: and identifying the watermark detection result of each frame of video image by utilizing a pre-trained classification network, and acquiring the identification result of the watermark detection result of each frame of video image.

In some embodiments, after obtaining the watermark detection result of each frame of video image output by the target detection network, the watermark detection result of each frame of video image may be input to a pre-trained classification network, so as to perform watermark recognition on the watermark detection result of each frame of video image through the classification network, and then after the target detection network outputs the recognition result of the watermark detection result of each frame of video image, the electronic device may obtain the recognition result of the watermark detection result of each frame of video image. The identification result can be used for representing whether a watermark area in the watermark detection result output by the target detection network is a watermark or not, namely, whether the watermark detection result output by the target detection network is wrongly detected or not can be judged.

In some embodiments, the classification network may be a shuffleNet efficient lightweight network based on deep learning, or may be a CNN (Convolutional Neural Networks, convolutional neural network), which is not limited herein, and may be reasonably set according to actual scene requirements. The classification network may be obtained by training the neural network model in advance according to a plurality of second training samples, where the second training samples may include a second image sample and a classification label sample corresponding to the second image sample. The classification labeling sample corresponding to the second image sample may be a sample labeled with whether or not the sample is not watermarked, or may be a sample labeled with a specific watermark type, which is not limited herein.

In some embodiments, to reduce effort, improve detection efficiency, training sample data sets of the classification network may be performed on the basis of the target detection network. Specifically, referring to fig. 6, before step S221, the video watermark detection method of the present application may further include:

Step S2201: and acquiring a watermark image in the first image sample.

In some embodiments, a first image sample of the first training samples may be obtained, so as to obtain a watermark image therein according to the first image sample. As one way, the watermark image may be obtained by determining the area of the watermark image according to the watermark marking sample in the first image sample, and cutting out the watermark image.

Step S2202: and carrying out boundary expansion on the watermark image, and acquiring the expanded watermark image as a second image sample.

Considering that noise is usually generated in a watermark image in a real scene, and generalization capability of a classification network is increased, after the watermark image is acquired, boundary expansion can be performed on the watermark image, so that watermark images with different expanded sizes are obtained and used as second image samples, and recognition accuracy of the classification network is improved. Wherein the boundary expansion may be an expansion of a less random size according to the watermark image.

Step S2203: and generating a classification labeling sample corresponding to the second image sample according to the watermark image in the second image sample.

After the second image sample is obtained, a classification labeling sample corresponding to the second image sample can be generated according to the watermark image in the second image sample. The corresponding classification labeling sample may be generated according to the position coordinates and the watermark class of the watermark image in the second image sample, that is, the classification labeling sample may include the position information and the type information of the watermark image.

It can be understood that by determining the training data set of the classification network according to the training data set of the target detection network, the trained target detection network can obtain an accurate watermark recognition result and reduce the workload of data collection.

Step S222: and determining the watermark detection result of which the identification result is the watermark in each frame of video image as a new watermark detection result of each frame of video image.

It can be appreciated that after the watermark detection result of each frame of video image is identified by the classification network, the watermark detection result identified as watermark can be obtained, and the watermark detection result identified as non-watermark can also be obtained. And filtering watermark detection results identified as non-watermarks so as to remove watermark areas wrongly detected by the target detection network, and obtaining watermark areas correctly detected by the target detection network, wherein the correctly detected watermark areas are watermark detection results of each frame of video image in the video to be processed. Specifically, the watermark detection result with the identification result being the watermark can be determined from the watermark detection results of all video images, and the watermark detection result with the identification result being the watermark in each frame of video image is determined to be the new watermark detection result of each frame of video image. Therefore, the target video image of which the watermark is not detected in the video to be processed can be obtained according to the new watermark detection result of each frame of video image, namely, the subsequent inter-frame smoothing process flow is carried out.

Step S230: and acquiring a target video image in which the watermark is not detected in the video to be processed according to the watermark detection result.

Step S240: and judging whether the adjacent video image detects the watermark according to the watermark detection result, wherein the adjacent video image is a video frame image adjacent to the target video image in the video to be processed.

Step S250: and when the adjacent video image detects the watermark, acquiring a watermark detection result of the adjacent video image as the watermark detection result of the target video image.

In the embodiment of the present application, the steps S230 to S250 can refer to the foregoing embodiments, and are not repeated here.

In some embodiments, the adjacent video image may be a previous frame video image or a next frame video image adjacent to the target video image in the video to be processed, or may be both a previous frame video image and a next frame video image, which is not limited herein.

Specifically, when the adjacent video image is the video image of the previous frame, that is, forward smoothing processing is performed: if the video image of the ith frame is detected watermark region set { bbox _k i k=1....a.k }, wherein bbox _k＝{x₁,y₁,x₂,y₂},(x₁,y₁) is the coordinates of the upper left corner of the watermark region and (x ₂,y₂) is the coordinates of the lower right corner of the watermark region. When no watermark is detected in the i+1th frame video image, the watermark detection result of the i frame video image, that is, the watermark region set { bbox _k |k=1.

Specifically, when the adjacent video image is a video image of a subsequent frame, that is, backward smoothing processing: if the i+1st frame video image detection results in a watermark region set { bbox _k |k=1. When no watermark is detected in the i-th frame video image, the watermark detection result of the i+1-th frame video image, that is, the watermark region set { bbox _k |k=1.

Step S260: and regenerating the watermark detection result of each frame of video image in the video to be processed according to the watermark detection result of the target video image.

In the embodiment of the present application, step S260 can refer to the foregoing embodiment, and is not repeated here.

According to the video watermark detection method provided by the embodiment of the application, after the watermark in the video to be processed is detected by utilizing the pre-trained target detection network to obtain the watermark detection result of each frame of video image in the video to be processed, the watermark detection result of each frame of video image can be identified by utilizing the pre-trained classification network so as to obtain the identification result of the watermark detection result of each frame of video image, so that the negative influence caused by the false detection of the target detection network can be reduced. And then determining the watermark detection result of which the identification result is the watermark in each frame of video image as a new watermark detection result of each frame of video image, so as to obtain a target video image of which the watermark is not detected in the video to be processed according to the new watermark detection result. And judging whether the adjacent video image detects the watermark according to the new watermark detection result, wherein the adjacent video image is a video frame image adjacent to the target video image in the video to be processed. When the adjacent video image detects the watermark, obtaining a watermark detection result of the adjacent video image as a watermark detection result of the target video image; and regenerating the watermark detection result of each frame of video image in the video to be processed according to the watermark detection result of the target video image. The application uses the continuity of video image frames and the smaller variability of watermark positions in adjacent frame video images, can avoid the problem of watermark omission in a certain frame or multiple frames of video images in the video to be processed by taking the watermark detection result of the adjacent frame video images of the target video image as the watermark detection result of the target video image when the target video image does not detect the watermark, and improves the accuracy of video watermark detection.

Referring to fig. 7, fig. 7 is a flowchart illustrating a video watermark detection method according to another embodiment of the application, which may be applied to an electronic device, and the video watermark detection method may include:

step S310: obtaining a standard watermark image to be detected;

Although the watermark detection result of the adjacent video image of the target video image can be copied to the target video image through the inter-frame smoothing processing, the omission is avoided, and at the same time, the influence of the error detection area is amplified. For example, watermark detection results of a previous frame video image and a subsequent frame video image of the target video image are copied to the target video image, and if the previous frame video image and the subsequent frame video image both have a false detection area, the target video image has twice the false detection area.

Therefore, in the embodiment of the application, a processing flow of removing the error detection area is added, namely, the watermark detection result of each frame of video image can be compared with the standard watermark image to be detected, and when the watermark detection area is different in comparison, the watermark detection area is the error detection area. Specifically, a standard watermark image to be detected may be acquired first. The standard watermark image to be detected can be a watermark image predetermined according to actual service requirements, so that whether the detected watermark image is a watermark image to be detected or not can be determined according to the standard watermark image. In some embodiments, the standard watermark image to be detected may be the aforementioned clean, background-free watermark sample set { L _w |w=1.

Step S320: and acquiring an area image of each watermark area in the watermark detection result of each frame of video image.

In the embodiment of the present application, after obtaining the watermark detection result of each frame of video image in the regenerated video to be processed through the inter-frame smoothing process, an area image of each watermark area bbox _k in the watermark detection result of each frame of video image may be obtained, so as to determine whether to be mistakenly detected according to the area image. The region image including the watermark region may be cut out according to the coordinate information in the watermark detection result.

Step S330: and respectively calculating the similarity value of each regional image in each frame of video image and the standard watermark image.

In the embodiment of the application, after the standard watermark image to be detected and the area image of each watermark area in the watermark detection result are obtained, the similarity value of each area image in each frame of video image and the standard watermark image can be calculated respectively, so that whether the area image is the watermark image to be detected or not can be determined according to the similarity value, and whether the area image is wrongly detected or not can be determined.

In some embodiments, the standard watermark image and the region image may be scaled to a uniform size to ensure accuracy of the result. As one way, the region image may be scaled to a size consistent with the standard watermark image, the standard watermark image may be scaled to a size consistent with the region image, or the standard watermark image and the region image may be scaled to a predetermined size, which is not limited herein.

In some embodiments, various deduplication algorithms such as Dhash, AHA, phash may be used to determine the similarity value between the region image and the standard watermark image, and the specific deduplication algorithm is not limited. For example, in the embodiment of the present application, a Dhash deduplication algorithm that is tested by experiments and is considered to have a better deduplication effect may be used to determine the similarity.

In some embodiments, when the Dhash deduplication algorithm is used to calculate the similarity, the region image after the size scaling may be first grayed with the standard watermark image, so as to obtain a standard watermark image L '_w after the graying and a region image bbox' _k after the graying. And then performing difference calculation of the two to obtain a difference value array. The difference calculation may be that the difference value is 1 when the difference value is greater than or equal to the pixel value on the right and 0 when the difference value is less than the pixel value on the right. Wherein the difference values are calculated in rows, if there are p pixels in each row in the image, then p-1 difference values will be generated. And then carrying out hash value conversion on the difference values, namely, regarding each difference value in the difference value array as a bit, forming a 16-system value by every 8 bits, connecting the 16-system values and converting the 16-system values into character strings, and obtaining the final dHash values. Finally, the Hamming distance is calculated according to dHash values of the region image and the standard watermark image (HAMMING DISTANCE). Thereby determining the similarity value according to the magnitude of the hamming distance. Wherein, the larger the Hamming distance, the smaller the similarity value, the lower the similarity.

Step S340: and determining a watermark region corresponding to the region image with the similarity value larger than a preset value in each frame of video image as a final watermark detection result of each frame of video image.

In the embodiment of the application, after the similarity value of each area image in each frame of video image and the standard watermark image is obtained, the area image with the similarity value larger than the preset value in each frame of video image can be obtained, and the area image with the similarity value larger than the preset value can be regarded as the image in the watermark area which is correctly detected in the watermark detection result. Therefore, the watermark area corresponding to the area image can be determined as the final watermark detection result of each frame of video image, namely the final watermark detection result without omission and correctness.

It can be understood that when the similarity value is smaller than the preset value, the area image is different from the standard watermark image, and is likely to be the image of the watermark area wrongly detected by the target detection network, so that the watermark area corresponding to the area image with the similarity value smaller than the preset value can be removed from the original watermark detection result, thereby obtaining the final correct watermark detection result.

In some embodiments, after determining the final correct watermark detection result of each frame of video image in the video to be processed, the electronic device may perform the watermark removal processing on the video to be processed according to the watermark detection result. Specifically, the electronic device may perform interpolation calculation on the watermark area that is reserved after the error removal detection area is processed, so as to fill pixels outside the area into the watermark area, to achieve the effect of removing the watermark, and then may output the repaired video image frames, so that all the repaired video image frames may be fused, and a clean video after removing the watermark is obtained. For example, referring to fig. 8, 9 and 10, fig. 8 shows an overall flowchart of video watermark detection provided by the present application, fig. 9 shows an effect diagram of watermark position detection results provided by the present application, detection boxes 610 and 620 are watermark positions obtained by detecting a video image 600, fig. 10 shows an effect diagram of video restoration after watermark removal provided by the present application, and compared with fig. 9, the watermark removal in detection boxes 610 and 620 detected in fig. 9 has been performed in fig. 10.

According to the video watermark detection method provided by the embodiment of the application, after the watermark detection result of the adjacent images is copied to the target image without watermark detection after the inter-frame smoothing treatment, and the watermark detection result of each frame of video image in the video to be processed is regenerated, the amplification influence of the error detection area caused by the inter-frame smoothing can be reduced by adding the processing flow for removing the error detection area, the accuracy of the final watermark detection result is improved, and the accuracy of video watermark detection is improved.

Referring to fig. 11, fig. 11 shows a block diagram of a video watermark detection apparatus 400 according to an embodiment of the application, where the video watermark detection apparatus 400 is applied to an electronic device. The video watermark detection apparatus 400 includes: the device comprises a target detection module 410, a target acquisition module 420, a neighboring judgment module 430, a result replication module 440 and a result generation module 450. The target detection module 410 is configured to detect a watermark of a video to be processed by using a pre-trained target detection network, and obtain a watermark detection result of each frame of video image in the video to be processed; the target obtaining module 420 is configured to obtain a target video image in the video to be processed, where the watermark is not detected, according to the watermark detection result; the adjacent judging module 430 judges whether an adjacent video image detects a watermark according to the watermark detection result, where the adjacent video image is a video frame image adjacent to the target video image in the video to be processed; the result duplication module 440 is configured to obtain, when the adjacent video image detects a watermark, a watermark detection result of the adjacent video image as a watermark detection result of the target video image; the result generating module 450 is configured to regenerate the watermark detection result of each frame of video image in the video to be processed according to the watermark detection result of the target video image.

In some embodiments, the object detection module 410 may be specifically configured to: carrying out framing treatment on the video to be treated to obtain a video image sequence; and detecting the watermark in the video image sequence by using a pre-trained target detection network, and obtaining a watermark detection result of each frame of video image in the video image sequence, wherein the target detection network is obtained by training the neural network in advance according to a first training sample, and the first training sample comprises a first image sample and a watermark labeling sample corresponding to the first image sample.

In some embodiments, the video watermark detection apparatus 400 may further include: the sample acquisition module is used for acquiring a plurality of background samples and a plurality of watermark samples; the sample synthesis module is used for randomly synthesizing the watermark samples and the background samples through a fusion algorithm to obtain a plurality of synthesized first image samples; and the sample labeling module is used for generating watermark labeling samples corresponding to each first image sample according to the synthesis positions of the watermark samples in each first image sample.

In some embodiments, the video watermark detection apparatus 400 may further include: the device comprises a result identification module, a watermark detection module and a watermark detection module, wherein the result identification module is used for identifying a watermark detection result of each frame of video image by utilizing a pre-trained classification network, the identification result is used for representing whether a watermark area in the watermark detection result is a watermark or not, the classification network is obtained by training a neural network in advance according to a second training sample, and the second training sample comprises a second image sample and a classification labeling sample corresponding to the second image sample; and the result judging module is used for determining the watermark detection result with the identification result of the watermark in each frame of video image as the new watermark detection result of each frame of video image.

In this embodiment, the target acquisition module 420 may be specifically configured to: and acquiring a target video image in which the watermark is not detected in the video to be processed according to the new watermark detection result.

Further, in some embodiments, the video watermark detection apparatus 400 may further include: the watermark acquisition module acquires a watermark image in the first image sample; the watermark expansion module is used for carrying out boundary expansion on the watermark image and acquiring an expanded watermark image as a second image sample; and the watermark labeling module is used for generating a classification labeling sample corresponding to the second image sample according to the watermark image in the second image sample.

In some embodiments, the adjacent video images in the adjacent judging module 430 may be: and a previous frame of video image or a next frame of video image adjacent to the target video image in the video to be processed.

In some embodiments, the video watermark detection apparatus 400 may further include: the standard watermark acquisition module is used for acquiring a standard watermark image to be detected; the regional image acquisition module is used for acquiring regional images of each watermark region in the watermark detection result of each frame of video image; the similarity calculation module is used for calculating the similarity value of each regional image and the standard watermark image in each frame of video image respectively; and the similarity judging module is used for determining the watermark area corresponding to the area image with the similarity value larger than a preset value in each frame of video image as a final watermark detection result of each frame of video image.

In some embodiments, the video watermark detection apparatus 400 may further include: and the watermarking removing module is used for carrying out watermarking removing treatment on the video to be treated according to the watermark detection result.

The video watermark detection device provided by the embodiment of the application is used for realizing the corresponding video watermark detection method in the foregoing method embodiment, and has the beneficial effects of the corresponding method embodiment, and is not described herein again.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus and modules described above may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.

In the several embodiments provided by the present application, the illustrated or discussed coupling or direct coupling or communication connection of the modules to each other may be through some interfaces, indirect coupling or communication connection of devices or modules, electrical, mechanical, or other forms.

In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules.

Referring to fig. 12, fig. 12 is a block diagram illustrating a structure of an electronic device according to an embodiment of the application. The electronic device 700 may be an electronic device such as a server capable of running applications. The electronic device 700 of the present application may include one or more of the following components: a processor 710, a memory 720, and one or more application programs, wherein the one or more application programs may be stored in the memory 720 and configured to be executed by the one or more processors 710, the one or more program(s) configured to perform the method as described in the foregoing method embodiments.

Processor 710 may include one or more processing cores. The processor 710 utilizes various interfaces and lines to connect various portions of the overall electronic device 700, perform various functions of the electronic device 700, and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 720, and invoking data stored in the memory 720. Alternatively, the processor 710 may be implemented in hardware in at least one of digital signal Processing (DIGITAL SIGNAL Processing, DSP), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 710 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for being responsible for rendering and drawing of display content; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 710 and may be implemented solely by a single communication chip.

Memory 720 may include random access Memory (Random Access Memory, RAM) or Read-Only Memory (ROM). Memory 720 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 720 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described below, etc. The storage data area may also store data created by the electronic device 700 in use, and the like.

It will be appreciated by those skilled in the art that the structure shown in fig. 12 is merely a block diagram of a portion of the structure associated with the present inventive arrangements and is not limiting of the electronic device to which the present inventive arrangements are applied, and that a particular electronic device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

Referring to fig. 13, a block diagram of a computer readable storage medium according to an embodiment of the application is shown. The computer readable storage medium 800 has stored therein program code that can be invoked by a processor to perform the methods described in the method embodiments described above.

The computer readable storage medium 800 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Optionally, the computer readable storage medium 800 comprises a non-transitory computer readable medium (non-transitory computer-readable storage medium). The computer readable storage medium 800 has storage space for program code 810 that performs any of the method steps described above. The program code can be read from or written to one or more computer program products. Program code 810 may be compressed, for example, in a suitable form.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be appreciated by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not drive the essence of the corresponding technical solutions to depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A method of video watermark detection, the method comprising:

detecting the watermark in the video to be processed by utilizing a pre-trained target detection network, and obtaining a watermark detection result of each frame of video image in the video to be processed;

obtaining a target video image in which the watermark is not detected in the video to be processed according to the watermark detection result;

judging whether a watermark is detected in an adjacent video image according to the watermark detection result, wherein the adjacent video image is a video frame image adjacent to the target video image in the video to be processed;

When the adjacent video image detects a watermark, acquiring a watermark detection result of the adjacent video image as a watermark detection result of the target video image, specifically, when the adjacent video image is a previous frame video image, if the ith frame video image is detected to obtain a watermark region set, wherein the watermark region set at least comprises: when the (i+1) th frame video image does not detect any watermark, copying a watermark region set of the (i+1) th frame video image to a watermark detection result of the (i+1) th frame video image, wherein the watermark region set of the (i) th frame video image is the watermark detection result of the (i) th frame video image;

Regenerating a watermark detection result of each frame of video image in the video to be processed according to the watermark detection result of the target video image;

After regenerating the watermark detection result of each frame of video image in the video to be processed according to the watermark detection result of the target video image, the method further comprises:

obtaining a standard watermark image to be detected;

acquiring a regenerated region image of each watermark region in a watermark detection result of each frame of video image in the video to be processed;

Respectively calculating the similarity value of each regional image in each frame of video image and the standard watermark image;

And determining a watermark region corresponding to the region image with the similarity value larger than a preset value in each frame of video image as a final watermark detection result of each frame of video image.

2. The method according to claim 1, wherein detecting the watermark in the video to be processed using the pre-trained object detection network, and obtaining the watermark detection result of each frame of video image in the video to be processed, comprises:

carrying out framing treatment on the video to be treated to obtain a video image sequence;

And detecting the watermark in the video image sequence by using a pre-trained target detection network, and obtaining a watermark detection result of each frame of video image in the video image sequence, wherein the target detection network is obtained by training the neural network in advance according to a first training sample, and the first training sample comprises a first image sample and a watermark labeling sample corresponding to the first image sample.

3. The method of claim 2, wherein prior to detecting the watermark in the sequence of video images using the pre-trained object detection network to obtain the watermark detection result for each frame of video image in the sequence of video images, the method further comprises:

Acquiring a plurality of background samples and a plurality of watermark samples;

Randomly synthesizing the watermark samples and the background samples through a fusion algorithm to obtain a plurality of synthesized first image samples;

and generating watermark labeling samples corresponding to each first image sample according to the synthesis positions of the watermark samples in each first image sample.

4. The method according to claim 2, wherein before the obtaining, according to the watermark detection result, a target video image in the video to be processed in which the watermark is not detected, the method further comprises:

Identifying a watermark detection result of each frame of video image by using a pre-trained classification network, and obtaining an identification result of the watermark detection result of each frame of video image, wherein the identification result is used for representing whether a watermark area in the watermark detection result is a watermark or not, the classification network is obtained by training a neural network in advance according to a second training sample, and the second training sample comprises a second image sample and a classification labeling sample corresponding to the second image sample;

determining the identification result in each frame of video image as a watermark detection result of the watermark as a new watermark detection result of each frame of video image;

the step of obtaining the target video image in which the watermark is not detected in the video to be processed according to the watermark detection result comprises the following steps:

And acquiring a target video image in which the watermark is not detected in the video to be processed according to the new watermark detection result.

5. The method of claim 4, wherein prior to said identifying the watermark detection result for each frame of the video image using the pre-trained classification network, the method further comprises:

acquiring a watermark image in the first image sample;

Performing boundary expansion on the watermark image, and acquiring the expanded watermark image as a second image sample;

And generating a classification labeling sample corresponding to the second image sample according to the watermark image in the second image sample.

6. The method according to any one of claims 1-5, wherein the adjacent video image is a previous frame video image or a subsequent frame video image of the video to be processed that is adjacent to the target video image.

7. The method according to any one of claims 1-5, wherein after the regenerating the watermark detection result for each frame of the video image in the video to be processed based on the watermark detection result for the target video image, the method further comprises:

And carrying out watermark removal processing on the video to be processed according to the watermark detection result.

8. A video watermark detection apparatus, the apparatus comprising:

The target detection module is used for detecting the watermark of the video to be processed by utilizing a pre-trained target detection network and obtaining a watermark detection result of each frame of video image in the video to be processed;

The target acquisition module is used for acquiring a target video image in which the watermark is not detected in the video to be processed according to the watermark detection result;

the adjacent judging module is used for judging whether the adjacent video image detects the watermark according to the watermark detection result, wherein the adjacent video image is a video frame image adjacent to the target video image in the video to be processed;

The result copying module is configured to obtain, when the adjacent video image detects a watermark, a watermark detection result of the adjacent video image as a watermark detection result of the target video image, specifically, when the adjacent video image is a previous frame video image, if the i frame video image is detected to obtain a watermark region set, where the watermark region set at least includes: when the (i+1) th frame video image does not detect any watermark, copying a watermark region set of the (i+1) th frame video image to a watermark detection result of the (i+1) th frame video image, wherein the watermark region set of the (i) th frame video image is the watermark detection result of the (i) th frame video image;

The result generation module is used for regenerating the watermark detection result of each frame of video image in the video to be processed according to the watermark detection result of the target video image;

The standard watermark acquisition module is used for acquiring a standard watermark image to be detected;

the regional image acquisition module is used for acquiring the regenerated regional image of each watermark region in the watermark detection result of each frame of video image in the video to be processed;

The similarity calculation module is used for calculating the similarity value of each regional image and the standard watermark image in each frame of video image respectively;

And the similarity judging module is used for determining the watermark area corresponding to the area image with the similarity value larger than a preset value in each frame of video image as a final watermark detection result of each frame of video image.

9. An electronic device, comprising:

One or more processors;

A memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the method of any of claims 1-7.

10. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a program code, which is callable by a processor for executing the method according to any one of claims 1-7.