WO2021190372A1

WO2021190372A1 - Video file processing method and device, and watermark extraction method and device

Info

Publication number: WO2021190372A1
Application number: PCT/CN2021/081259
Authority: WO
Inventors: 刘永亮; 杨锐
Original assignee: 阿里巴巴集团控股有限公司
Priority date: 2020-03-24
Filing date: 2021-03-17
Publication date: 2021-09-30
Also published as: CN113453039B; CN113453039A

Abstract

Disclosed are a video file processing method and device and a watermark extraction method and device. The video file processing method comprises: obtaining video data and audio data of a video file; embedding first watermark information in the video data; embedding second watermark information in the audio data in an associated manner; and obtaining a watermark embedded video file. In the present application, mutually related audio and video watermarks are add into the video file by using the temporal and spatial relevance of a video sequence and an audio data contained in the video, thereby increasing the watermark coverage of the video file. In addition, the mutual verification between audio and video watermarks improves the robustness of embedded watermarks against malicious tampering.

Description

Method and device for video file processing and watermark extraction

This application claims the priority of the Chinese patent application with the application number 202010215301.1 and the invention title of "Video file processing and watermark extraction method and device" filed on March 24, 2020, the entire content of which is incorporated into this application by reference.

Technical field

The present disclosure relates to the technical field of digital media processing, and in particular to a method and device for video file processing and watermark extraction.

Background technique

With the rapid development of the Internet, the production and viewing of videos has become increasingly convenient and popular. Driven by interests, the problem of video piracy has gradually become prominent. Some third parties will steal the video displayed by the video producer through some technical means. For example, many UP owners on short video platforms obtain benefits by simply editing other people's videos to generate pseudo-original videos. The above-mentioned problems have disrupted the development of video and related industries.

For this reason, a method that can better confirm the copyright of the video is needed.

Summary of the invention

In order to solve at least one of the above problems, the present disclosure provides a video file processing and corresponding watermark extraction method. This method utilizes the temporal and spatial relevance of the video sequence and audio data contained in the video, and adds mutually related audio and video watermarks to the video file, thereby improving the watermark coverage of the video file. In addition, through mutual verification between audio and video double watermarks, the robustness of embedded watermarks against malicious tampering is improved.

According to a first aspect of the present disclosure, there is provided a video file processing method, including: acquiring video data and audio data of the video file; embedding first watermark information in the video data; Associating and embedding the second watermark information; and obtaining the watermark-embedded video file.

According to a second aspect of the present disclosure, there is provided a method for extracting a video watermark, including: obtaining the watermark-embedded video file according to the first aspect; and extracting video data and audio data from the watermark-embedded video file ; Extracting the first watermark information embedded in the video data; and extracting the second watermark information embedded in the audio data.

According to a third aspect of the present disclosure, there is provided a method for extracting a streaming media watermark, including: acquiring streaming media data embedded with a watermark, the streaming media data being generated from the watermark-embedded video file described in the first aspect; Extracting video data and audio data from the watermark-embedded streaming media data; extracting the first watermark information embedded in the video data; and extracting the second watermark information embedded in the audio data.

According to a fourth aspect of the present disclosure, there is provided a streaming media data processing method, including: acquiring video data and audio data of the streaming media data; embedding first watermark information into the video data; The second watermark information is associated and embedded in the data; and the watermark-embedded streaming media data is obtained.

According to a fifth aspect of the present disclosure, there is provided a video file processing device, including: a video parsing unit for acquiring video data and audio data in the video file; a video watermark embedding unit for sending the video to the video The first watermark information is embedded in the data; the audio watermark embedding unit is used to associate and embed the second watermark information into the audio data; and the video mixing unit is used to mix the video data embedded in the first watermark information and the second watermark information. The audio data of the watermark information is used to obtain a watermark embedded video file.

According to a sixth aspect of the present disclosure, there is provided a video watermark extraction device, including: a video parsing unit for obtaining video data and audio data in a watermark-embedded video file according to the first aspect; video watermark The extraction unit is used to extract the first watermark information from the video data; and the audio watermark extraction unit is used to extract and embed the second watermark information from the audio data.

According to a seventh aspect of the present disclosure, there is provided a computing device, including: a processor; and a memory on which executable code is stored. When the executable code is executed by the processor, the processor is caused to execute the first To the method described in the fourth aspect.

According to an eighth aspect of the present disclosure, there is provided a non-transitory machine-readable storage medium having executable code stored thereon. When the executable code is executed by a processor of an electronic device, the processor executes the above-mentioned One to the method described in the fourth aspect.

This application embeds audio watermark and video watermark in the video file at the same time. The two do not interfere with each other and complement each other. When extracting the watermark, the audio and video watermark extraction information can be adaptively fused, thereby greatly improving the robustness of the video file watermark, especially Fight against malicious editing attacks on video content.

Description of the drawings

Through a more detailed description of the exemplary embodiments of the present disclosure in conjunction with the accompanying drawings, the above and other objectives, features and advantages of the present disclosure will become more apparent. Among them, in the exemplary embodiments of the present disclosure, the same reference numerals generally represent The same parts.

Fig. 1 shows a schematic flowchart of a video file processing method according to the present application;

Figure 2 shows an example of embedding watermark information for video sequence and audio code stream respectively;

Figure 3 shows an example of the joint watermark embedding process according to the present application;

Figure 4 shows a schematic flowchart of a video watermark extraction method according to an embodiment of the present application;

Figure 5 shows an example of a joint watermark extraction process according to the present application;

Figure 6 shows a schematic diagram of a brief scheme of the watermark embedding and extraction operations of this application;

Fig. 7 shows a schematic structural diagram of a computing device that can be used to implement the above video processing and watermark extraction method according to an embodiment of the present application.

Detailed ways

Hereinafter, preferred embodiments of the present disclosure will be described in more detail with reference to the accompanying drawings. Although the drawings show preferred embodiments of the present disclosure, it should be understood that the present disclosure can be implemented in various forms and should not be limited by the embodiments set forth herein. On the contrary, these embodiments are provided to make the present disclosure more thorough and complete, and to fully convey the scope of the present disclosure to those skilled in the art.

In recent years, digital watermarking technology has achieved certain results in the field of video copyright protection. This technology can embed the watermark information in the carrier information (for example, redundant information), thereby protecting the video copyright. At present, videos on common platforms usually contain two parts: video sequence and audio data. Many digital watermark applications process audio and video as a whole. The third party will edit and falsify the partial video sequence or partial audio in the pirated video production process (such as removing the logo watermark, adding a new logo watermark, inserting advertisements, modifying the resolution, etc.). The combination of these editing operations introduces the original video signal Severe distortion results in that the watermark information cannot be extracted in this part. Therefore, the solution of embedding a single watermark for video sequences or audio data does not guarantee the higher-level copyright protection requirements of the video.

For this reason, this application proposes an audio-video joint watermarking scheme. The different data content (video sequence and audio data) contained in the video has a strong correlation in time and space and restricts each other. Most of the existing video watermarking algorithms only do embedding and extraction processing on the video code stream, without using the correlation between audio and video, or without considering the existence of audio data at all. This application introduces the correlation between audio and video data into the video watermarking process to make the video watermark more accurate and robust, so that copyright protection and digital content authentication can achieve better results.

This application can first be implemented as a solution for adding a watermark to audio and video. The above solution can first be implemented as a video file processing method. Fig. 1 shows a schematic flowchart of a video file processing method according to the present application. Here, "video file" refers to a file that usually includes both video and audio information. In other words, video files include both image content and sound content.

In step S110, the video data and audio data of the video file are acquired. For video files to be embedded with watermarks, existing tools can be used to separate audio and video. For example, a video file parsing tool can be used to operate on the video file to be embedded with the watermark to extract video data (usually a video sequence) and audio data (usually an audio stream).

In step S120, the first watermark information is embedded in the video data.

In step S130, the second watermark information is associated and embedded into the audio data. It should be understood that the steps of inserting the video watermark and the audio watermark can also be performed at the same time, or the audio watermark can be inserted first. As long as there is a correlation between the watermark embedding of the video and the speaker. In the subsequent extraction of watermarks, the above-mentioned correlation can be used to perform correlation verification of audio and video watermarks, thereby improving the anti-tampering ability of the watermarks.

In some embodiments, the above-mentioned association may be a time association, that is, the time when the second watermark information is embedded in the audio data may be associated with the time when the first watermark information is embedded in the video data. As an alternative or supplement, the above-mentioned association may also be a content association, that is, the content of the second watermark information embedded in the audio data may be associated with the content of the first watermark information embedded in the video data. In a preferred embodiment, the time and content of the audio and video watermarking may have a predetermined association relationship, so as to facilitate subsequent mutual verification of the extracted audio and video watermark information.

The above-mentioned relevance in time and content may be the same as each other. For example, you can add audio and video watermarks at the same position on the video and audio timelines, respectively. Considering that the length of watermark data often needs to involve continuous addition within a period of time, and there is a difference in embedding capacity between video sequences and audio streams of the same duration (that is, it may only take 7 seconds for a video sequence to complete the embedding of 64 bits, but may require The 10-second audio code stream can only complete the 64-bit embedding), so the same time may be the same initial time of embedding. Therefore, in one embodiment, the start time of embedding the second watermark information into the audio data may be the same as the start time of embedding the first watermark information into the video data. Therefore, when the double watermark is extracted, it is convenient to align and verify each other of the audio and video watermarks.

Although in some embodiments, the watermark content of audio and video may be related to each other based on a certain mapping relationship, rather than being completely the same. However, in order to improve the robustness of the video against tampering after publishing, it is preferable that the audio and video have the same watermark content. In other words, the content of the second watermark information embedded in the audio data may be the same as the content of the first watermark information embedded in the video data. As a result, the robustness of mutual verification of audio and video watermarks during watermark extraction is further improved.

Since video files usually have a certain length of time, in order to prevent theft by simply intercepting part of the content of the file (for example, select 30 seconds of the essence of a 3-minute video to publish), you can send the video sequence and audio in a video file Watermark information is added to the code stream multiple times to enhance protection. To this end, embedding the first watermark information into the video data may include: embedding a plurality of first watermark information into the video data at a first predetermined time interval. Correspondingly, embedding the associated second watermark information into the audio data includes: embedding a plurality of second watermark information into the audio data at a second predetermined time interval. As mentioned above, in order to improve robustness and facilitate alignment, it is preferable to embed audio and video watermarks at the same time interval. In addition, although the predetermined time interval may be a non-uniform time interval, for example, embedding one every 10 seconds in the first minute and embedding one every 15 seconds in the next minute, it is still preferable to perform the watermark insertion at a uniform predetermined time interval to improve the watermarking performance. Anti-tampering ability.

In addition, as mentioned earlier, because the length of watermark data often involves continuous addition within a period of time, and there is a difference in embedding capacity between video sequences and audio streams of the same length (that is, it may only take 7 seconds for a video sequence to complete the embedding. 64 bits, and it may take 10 seconds to complete the 64-bit embedding of the audio stream. Therefore, when multiple watermarks are repeatedly set, it is also necessary to consider the synchronization of audio and video watermark settings during embedding and extraction. To this end, each of the plurality of first watermark information may include: first ranking data and first watermark data. Correspondingly, each second watermark information in the plurality of second watermark information may include: second ranking data and second watermark data. In other words, the actually added watermark information may include the sort code and the watermark information itself, and it is repeated at a certain time interval.

Figure 2 shows an example of embedding watermark information for a video sequence and an audio code stream respectively. As shown in Figure 2, the same watermark information can be repeatedly inserted in the video sequence and audio code stream aligned on the time axis at a time interval of 10 seconds. The watermark information inserted in each time interval includes synchronization code and single-period watermark information.

Here, the "synchronization code" can be a code used in audio or video to sort and count the added single-period watermark information. For example, the single-period watermark information added at the beginning of the first time interval (ie, the 0th second) is "000...0000", at the beginning of the second time interval (ie, the 10th second) The added single-period watermark information can be increased by 1 to become "000……0001", and the single-period watermark information added at the beginning of the third time interval (ie, the 20th second) can be increased by 1 and become "000. ……0010", and so on. Further, the "synchronization" of the "synchronization code" may refer to the synchronization between the video sequence and the audio code stream. That is, in the subsequent watermark extraction stage, the video sequence and the audio code stream can be aligned in time with the help of the recovered synchronization code.

Here, "single-period watermark information" can refer to an identification code used to uniquely represent the identity of the video or the identity of the video producer (or the identity of the video publisher), for example, the "48-bit complete cycle of one period" shown in Figure 2 "Watermark information" is thus distinguished from all watermark-related information inserted in the entire audio or video (including the ever-increasing synchronization code and the watermark identification code that has been repeated thereafter).

As shown in Figure 2, both the synchronization code and the single-period watermark information are binary data represented by 0 and 1, which is the same as the data system in the actual audio and video processing.

Due to the difference in embedding capacity between video sequence and audio code stream of the same duration, it only takes 7 seconds to complete the 64-bit embedding of the video sequence, while it takes 10 seconds to complete the 64-bit embedding of the audio code stream, so the watermarking interval is repeated. It should be no less than 10 seconds long. Although the watermark insertion is continuously performed at a predetermined interval of 10 seconds in this example, it should be understood that in other embodiments, the watermark insertion may also be performed at a longer time interval, for example, once every 20 seconds. . Similarly, each embedded synchronization code and single-period watermark information can also have a different number of bits, for example, synchronization code 32 bits, watermark information 32 bits, etc., and the overall length of synchronization code and watermark information can also be less than 64 bits. Other lengths.

After the watermark insertion is completed respectively, in step S140, the watermark-embedded video file can be obtained.

Specifically, a video sequence and audio data containing the same watermark information can be stream-mixed to obtain a video file containing a double watermark. The video file with the double watermark embedded above can be released later, and when necessary, the watermark is extracted and restored based on a predetermined method.

In different embodiments, different audio and video watermark embedding methods can be selected to insert watermark data for audio and video respectively, as long as the embedding has relevance.

In one embodiment, in order to prevent the influence of watermark embedding on the audiovisual effect, a corresponding area may be selected for embedding operation based on the respective attributes of the audio and video data. To this end, embedding the first watermark information into the video data may include: adding the first watermark information to a non-significant area of a video frame in the video data. Preferably, video key frames can be extracted, and content analysis can be performed to select non-significant areas for adding watermark information. As an alternative or supplement, embedding the second watermark information into the audio data includes: adding the second watermark information to the auditory insensitive area of the audio frame in the audio data. Preferably, the audio is divided into frames to select areas that are not auditorily sensitive to the frequency band in each frame of audio for adding watermark information.

In the specific embedding, you can choose to embed the energy watermark. Here, "energy watermark" refers to an embedded watermark algorithm by adjusting the energy relationship between adjacent regions of a certain transformation domain of media content. When the watermark information includes a large number of bits, the insertion of each complete watermark usually needs to be performed continuously for a series of video frames and audio frames. To this end, embedding the first watermark information into the video data may include: embedding the first watermark information by adjusting the energy relationship between adjacent regions of the video frame transform domain in the video data. Further, by adjusting the energy relationship between adjacent regions of the video frame transform domain in the video data, embedding the first watermark information may include: selecting a series of specific video frames in a video sequence, and the video data is a video sequence; And adjust the energy relationship of the adjacent regions of the transform domain of the series of specific video frames, and embed the constituent bit information of the first watermark information one by one. Here, selecting a series of specific video frames in the video sequence may include: selecting video key frames in the video sequence. For example, one bit of the 64-bit information is embedded in each key frame, and finally the synchronization code and the single-period watermark information are embedded in the 64 consecutive key frames.

Correspondingly, embedding the associated second watermark information into the audio data may include: adjusting the energy relationship of adjacent audio frames, and embedding the second watermark information. Further, adjusting the energy relationship of adjacent audio frames, and embedding the second watermark information may include: adjusting the energy relationship of adjacent frequency bands of the series of adjacent audio frames, and embedding the composition of the second watermark information one by one Bit information. For example, the energy relationship between two adjacent frequency bands can be adjusted, and one bit of the 64-bit information can be embedded. Finally, the synchronization code and the single-period watermark information can be embedded between 65 consecutive adjacent frames.

The above-mentioned video processing method of the present application is particularly suitable for being implemented as an audio-video joint watermarking solution. Figure 3 shows an example of the joint watermark embedding process according to the present application. For the video file to be watermarked, you can first use the video file analysis tool to extract the video sequence and audio data separately. Subsequently, as shown in Figure 3, the process is divided into two parallel branches.

For the extracted video sequence, the video key frames are extracted, and content analysis is performed to select non-salient areas. For the selected insignificant areas, synchronization codes and watermark bits can be embedded based on the energy relationship of adjacent blocks to obtain a video sequence with watermarked information. In combination with the example in Figure 2, it is possible to extract the key frames of the video from 10 seconds to 17 seconds, perform content analysis to select insignificant areas, and then perform transformations such as DCT (Discrete Cosine Transform) to obtain its transform domain. Adjust the energy relationship of neighboring regions to embed watermark information. For example, a selected non-salient area of a key frame is embedded with one bit, so that 64 consecutive key frames contain 64-bit synchronization code and single-period watermark information. It is convenient for the subsequent extraction process to locate the watermark information through key frame extraction, non-significant region selection and adjacent region energy relationship search.

On the other hand, you can divide the audio into frames and select areas that are insensitive to hearing in the mid-band of each frame of audio. For the selected areas that are not sensitive to hearing, synchronization codes and watermark bits are embedded based on the energy relationship of adjacent frequency bands to obtain audio with watermarked information. In combination with the example in Figure 2, you can extract audio frames within a period of 10 to 20 seconds, select a frequency band that is not sensitive to hearing, and then perform a transformation such as DCT (Discrete Cosine Transform) to obtain its transform domain. The adjustment of the energy relationship, the embedding of watermark information, for example, select a hearing-insensitive area between two adjacent frames and embed a bit, so that 65 consecutive frames contain a 64-bit synchronization code and a single cycle Watermark information to facilitate the subsequent extraction process to locate the watermark information through audio framing, auditory insensitive area selection, and energy relationship search.

After watermarking is performed on both the video and the video, the video sequence and audio data containing the same watermark information can be stream-mixed to obtain a video file with double watermarks.

Subsequently, the video file with the above-mentioned audio and video double watermark can be released for viewing and use. After the video files are released, they may be tampered with and released as pseudo-original videos for a second time. At this time, the watermark extraction operation can be performed on these videos to clarify the identity of the original publisher of the video.

For this reason, this application can also implement a video watermark extraction method. Fig. 4 shows a schematic flowchart of a video watermark extraction method according to an embodiment of the present application.

In step S410, a video file embedded with a watermark is obtained. Here, the video file with embedded watermark may be the processed video file described above in conjunction with FIG. 1 to FIG. 3, and the video file may be a video file with embedded audio and video double watermarks. In addition to verifying the identity of the video when the video is not tampered with, the embedding of the watermark also needs to be able to be extracted even when the video is attacked and tampered with in many cases. For this reason, what is obtained in step S410 may be a tampered video file with embedded watermark.

Subsequently, in step S420, video data and audio data are extracted from the watermark-embedded video file. Similarly, for the video file whose watermark is to be extracted, existing tools can be used to separate audio and video. For example, a video file parsing tool can be used to operate on the video file to be watermarked to extract video data (usually a video sequence) and audio data (usually an audio stream).

In step 430, the first watermark information embedded in the video data is extracted.

In step S440, the second watermark information embedded in the audio data is extracted. It should be understood that the above extraction steps of the video watermark and the audio watermark may also be performed at the same time, or the audio watermark may be extracted first. The extracted watermark can use the relevance when the previous watermark is added to perform mutual verification of audio and video watermarks, so as to improve the anti-tampering ability of the watermark.

It can be seen from the above that the above-mentioned relevance can be a time relevance or a content relevance. Preferably, the above-mentioned relevance in time and content may be the same as each other. In order to improve the robustness, the video file to be embedded can be segmented according to the time axis. The video sequence and audio data at the same time point are embedded with the same watermark information to facilitate the extraction of the video watermark information and audio watermark information at the same time point. Mutual authentication.

To this end, the watermark extraction method of the present application may further include generating the extracted watermark of the video file according to the extracted first watermark information and the second watermark information. The generation of the final extracted watermark can be determined based on the correlation between the audio and video watermarks obtained in advance. Specifically, in the case where it is determined that the first watermark information and the second watermark information include the same watermark data, a weighted summation may be performed on the watermark data included in each of the first watermark information and the second watermark information To generate the extracted watermark of the video file. Specifically, the weight of the watermark data included in each of the first watermark information and the second watermark information may be adjusted according to the degree of confidence.

As mentioned above, in order to protect the entire duration of the video, the added first watermark information may include multiple sets of watermark data containing the first ranking data and the first watermark data, and the added second watermark information may include Multiple sets of watermark data of the second sorted data and the second watermark data. To this end, extracting the first watermark information embedded in the video data may include: determining subsequent first watermark data based on the extracted first ranking data, and extracting the second watermark information embedded in the video data may include: Based on the extracted second ranking data, the subsequent second watermark data is determined. In other words, the existence of watermark information can be further located by locating the sorted data that is easier to distinguish. For example, by finding the synchronization code "000...0001" as shown in FIG. 2 to locate the single-cycle synchronization code immediately following it.

Specifically, when the watermark is extracted, it can be performed on a specific part of the audio and video data. To this end, extracting the first watermark information embedded in the video data includes: determining the video frame and/or video area that contains the first watermark information in the video data, and extracting the second watermark information embedded in the audio data. The watermark information includes: determining an audio frame and/or audio area in the audio data that contains the second watermark information. The determination of the above-mentioned area can be reversed based on the area when the watermark is embedded. For example, selecting a non-salient area of a key video frame, and/or selecting a frequency band auditory insensitive area in an adjacent audio frame.

The extraction of watermark bits can also be reversed based on the embedding algorithm. When the embedded watermark is an "energy watermark", the constituent bit information of the first watermark information that conforms to a predetermined energy relationship can be extracted from a series of determined video frames and/or video regions; and the extracted constituent bit information Combined into the first watermark information. In addition, for audio watermarks, the constituent bit information of the second watermark information that conforms to a predetermined energy relationship can be extracted from a series of determined audio frames and/or audio regions; and the extracted constituent bit information can be combined into all The second watermark information.

Figure 5 shows an example of a joint watermark extraction process according to the present application. The watermark extraction process in FIG. 5 can be regarded as the corresponding operation of the watermark embedding process in FIG. 3.

For the video file to be watermarked, you can first use the video file analysis tool to extract the video sequence and audio data separately. Subsequently, as shown in Figure 5, the process is divided into two parallel branches.

For video branches, video key frames can be extracted, and content analysis can be performed to select non-salient areas. Subsequently, the synchronization code and watermark bits are extracted from the video sequence based on the energy relationship of the neighboring blocks, and the watermark information wm1 is obtained. Here, the watermark information wm1 can be regarded as the single-period watermark information added to the video sequence in the example in FIG. 2.

For the audio branch, the audio can be divided into frames, and an area that is insensitive to hearing in the frequency band of each frame of audio can be selected. Subsequently, the synchronization code and watermark bits are extracted from the audio based on the energy relationship of the adjacent frequency bands, and the watermark information wm2 is obtained. Here, the watermark information wm1 can be regarded as the single-period watermark information added to the audio code stream in the example in FIG. 2.

The synchronization code can be used to align the audio watermark information and the video watermark, and the watermark information of the two sources after the alignment is weighted and added. The weights a1 and a2 are adaptively adjusted according to whether the current time axis position can successfully extract the watermark. When the video When the watermark attack is relatively strong, a1 becomes smaller, and when the audio watermark attack is relatively strong, a2 becomes smaller. Except for the failure to extract audio and video watermarks at the same time, keep a1+a2=1, wm=wm1*a1+wm2*a2, and finally get the watermark information wm.

Figure 6 shows a schematic diagram of a brief scheme of the watermark embedding and extraction operations of this application. As shown in the schematic diagram of the scheme, the video file to be protected is segmented according to the time axis, and the video sequence and audio data at the same time point are embedded with the same watermark information. To simplify the principle description, the figure shows the watermark information as 0 or 1 embedded at the beginning of each segment. In a more practical operation, considering the video sequence and audio code stream of the same duration, there is a difference in embedding capacity, that is, it may only take 7 seconds for a video sequence to complete the embedding of 64 bits, while it may take 10 seconds for an audio code stream. To complete the 64-bit embedding, it is necessary to consider the synchronization of audio and video watermark settings during embedding and extraction. To this end, as shown in Figure 2, within the same duration of the initial time, according to the watermark algorithm, a longer-digit synchronization code and watermark information can be embedded.

After a video file has been attacked by malicious editing, the watermark extraction operation can be performed when the copyright information of the video needs to be proved. As shown on the right side of Figure 6, the previously embedded video watermark wm1 and audio watermark wm2 can be extracted based on the same time axis segment, and the final extracted watermark can be obtained by weighted summation wm=wm1*a1+wm2*a2 information.

The watermark embedding and association extraction scheme will be described below in conjunction with a specific embodiment. A is a video file to be embedded with a watermark (such as a film and television work). In order to protect the copyright of A, a joint watermark can be embedded for it. First, the video file analysis tool is used to extract the video sequence and audio data separately from the video file to be embedded with the watermark. Subsequently, in the video branch, video key frames can be extracted, and the embedding area can be selected based on the key and content analysis. A video watermarking algorithm is selected to embed the synchronization code and watermark bits in the video sequence (that is, the embedding area selected in the previous step) to obtain a video sequence with watermarked information. In the audio branch, the audio can be divided into frames, and the embedded area in each frame of audio can be selected based on the key. Select an audio watermarking algorithm to embed the synchronization code and watermark bits in the audio to obtain the audio with watermarked information. Finally, the video sequence containing the same watermark information is mixed with the audio data to obtain a video file containing double watermarks.

When the copyright of A needs to be verified later, the extraction process of the joint watermark may include operations corresponding to the above embedding process. First, the video file analysis tool can also be used to extract video sequences and audio data from watermarked video files. Subsequently, in the video branch, video key frames can be extracted, and the extraction area can be selected based on the key and content analysis. Select the corresponding video watermark algorithm to extract the synchronization code and watermark bit w1 from the video sequence. In the audio branch, the audio can be divided into frames, and the extraction area of each frame of audio can be selected based on the key. Select the corresponding audio watermark algorithm to extract the synchronization code and watermark bit wm2 from the audio. Finally, the synchronization code can be used to align the audio watermark information and the video watermark, and the watermark information of the two sources after the alignment is weighted and added, and the weights a1 and a2 are adaptive according to the confidence that the current time axis position contains the audio and video watermark. Adjustment.

In addition, in the case where the audio and video watermark cannot be extracted at the same time (for example, audio or video is replaced), copyright verification can also be performed based on a separate audio or video extraction watermark.

The video file with the audio and video combined watermark added according to the embodiments described in Figs. 1 to 3 of the present application can be published directly (or after being tampered by a third party) on the video website. After the video files are published on the video website, they can be obtained by web visitors in the form of streaming media. Here, streaming media refers to a technology and process in which a series of multimedia data is sent in segments via the Internet (usually compressed data), and video and audio are instantly transmitted on the Internet for viewing. Streaming media data can refer to multimedia data sent in segments via the Internet. This technology enables data packets to be sent and watched continuously, without the need to download the entire media file before use.

When performing watermark extraction, watermark extraction can be performed on complete video files, or watermark extraction can be performed on video streams in the form of streaming media. To this end, the present application can also be implemented as a streaming media watermark extraction method, including: acquiring watermark-embedded streaming media data, the streaming media data being generated from the above-mentioned watermark-embedded video file; from the watermark-embedded stream Extracting video data and audio data from the media data; extracting the first watermark information embedded in the video data; and extracting the second watermark information embedded in the audio data.

In some embodiments, streaming media data of a certain length of time (for example, 30 seconds or 1 minute) can be acquired cumulatively, and audio and video data can be extracted from the streaming media data (which can be regarded as video data fragments) within a certain length of time, and Extraction of the first and second watermark information. In other embodiments, it is also possible to perform streaming extraction of audio and video data and streaming extraction of the first and second watermark information on streaming media data acquired in real time. This application does not impose restrictions on this.

Further, in a more generalized embodiment, the embedding of the watermark can also be performed on video files in the form of streaming media. To this end, this application can also be implemented as a streaming media data processing method, including: acquiring video data and audio data of the streaming media data; embedding first watermark information into the video data; Correlating and embedding the second watermark information; and obtaining the watermark-embedded streaming media data. Similarly, the foregoing operations can be performed on accumulated streaming media data (which can be regarded as video data fragments) of a certain length of time, or streaming media data obtained in real time can be embedded, which is not limited in this application.

The robustness of traditional video watermarking schemes in down-sampling and recompression is very unsatisfactory, because down-sampling and re-compression leads to drastic changes in video key frames, blocks, local content, etc., resulting in the inability to extract the video watermark information at the corresponding position. This solution embeds the same watermark information in the same time axis position of the audio to form a complement, and then performs the weighted addition of the video watermark and the audio watermark in the watermark stage, so as to successfully avoid the disadvantage that traditional video watermarks are easy to fail after downsampling and recompression.

In addition, this application can also be implemented as a video file processing device, including: a video parsing unit for obtaining video data and audio data in the video file; a video watermark embedding unit for embedding in the video data The first watermark information; an audio watermark embedding unit for embedding the second watermark information into the audio data; and a video mixing unit for mixing the video data embedded in the first watermark information and the video data embedded in the second watermark information The audio data is used to obtain a video file embedded with a watermark. In one embodiment, the video analysis unit may analyze the video file in the form of streaming media, for example, real-time analysis.

Correspondingly, the present application can also be implemented as a video watermark extraction device, including: a video analysis unit for acquiring the video data and audio data in the video file embedded with the watermark as described above; and a video watermark extraction unit for obtaining Extracting the first watermark information from the video data; and an audio watermark extraction unit for extracting and embedding the second watermark information from the audio data. The device may further include: a watermark information generating unit, configured to perform a weighted summation of the watermark data included in each of the first watermark information and the second watermark information to generate the extracted watermark of the video file. Similarly, in one embodiment, the video parsing unit may analyze video files in the form of streaming media, for example, real-time parsing.

Referring to FIG. 7, the computing device 700 includes a memory 710 and a processor 720.

The processor 720 may be a multi-core processor, or may include multiple processors. In some embodiments, the processor 720 may include a general-purpose main processor and one or more special co-processors, such as a graphics processing unit (GPU), a digital signal processor (DSP), and so on. In some embodiments, the processor 720 may be implemented using a customized circuit, such as an Application Specific Integrated Circuit (ASIC) or a Field Programmable Gate Array (FPGA, Field Programmable Gate Arrays).

The memory 710 may include various types of storage units, such as a system memory, a read only memory (ROM), and a permanent storage device. Among them, the ROM may store static data or instructions required by the processor 720 or other modules of the computer. The permanent storage device may be a readable and writable storage device. The permanent storage device may be a non-volatile storage device that does not lose stored instructions and data even after the computer is powered off. In some embodiments, the permanent storage device adopts a large-capacity storage device (such as a magnetic or optical disk, flash memory) as the permanent storage device. In other embodiments, the permanent storage device may be a removable storage device (for example, a floppy disk, an optical drive). The system memory can be a readable and writable storage device or a volatile readable and writable storage device, such as dynamic random access memory. The system memory can store some or all of the instructions and data needed by the processor at runtime. In addition, the memory 710 may include any combination of computer-readable storage media, including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), and magnetic disks and/or optical disks may also be used. In some embodiments, the memory 710 may include a removable storage device that can be read and/or written, such as a compact disc (CD), a read-only digital versatile disc (for example, DVD-ROM, double-layer DVD-ROM), Read-only Blu-ray discs, ultra-density discs, flash memory cards (such as SD cards, min SD cards, Micro-SD cards, etc.), magnetic floppy disks, etc. The computer-readable storage medium does not include carrier waves and instantaneous electronic signals transmitted wirelessly or wiredly.

The memory 710 stores executable codes. When the executable codes are processed by the processor 720, the processor 720 can be made to execute the video processing and watermark extraction methods described above.

The method and device for video processing and watermark extraction according to the present application have been described in detail above with reference to the accompanying drawings. This application embeds audio watermark and video watermark in the video file at the same time. The two do not interfere with each other and complement each other. When extracting the watermark, the audio and video watermark extraction information can be adaptively fused, thereby greatly improving the robustness of the video file watermark, especially Fight against malicious editing attacks on video content. Specifically, based on the adaptive idea of fusion of audio and video dual watermarks, the weights of audio watermarks and video watermarks can be dynamically adjusted according to their respective reliability to ensure the reliability of the fusion watermark. In addition, for the audio watermark and video watermark of the same content are out of synchronization, the segmented synchronization code method can be used to achieve dual watermark synchronization.

In addition, the method according to the present application can also be implemented as a computer program or computer program product, and the computer program or computer program product includes computer program code instructions for executing the above-mentioned steps defined in the above-mentioned method of the present application.

Alternatively, this application can also be implemented as a non-transitory machine-readable storage medium (or computer-readable storage medium, or machine-readable storage medium) on which executable code (or computer program, or computer instruction code) is stored ), when the executable code (or computer program, or computer instruction code) is executed by the processor of the electronic device (or computing device, server, etc.), the processor is made to execute each step of the above-mentioned method according to the present application .

Those skilled in the art will also understand that the various exemplary logic blocks, modules, circuits, and algorithm steps described in conjunction with the disclosure herein can be implemented as electronic hardware, computer software, or a combination of both.

The flowcharts and block diagrams in the accompanying drawings show the possible implementation of the system architecture, functions, and operations of the system and method according to multiple embodiments of the present application. In this regard, each block in the flowchart or block diagram may represent a module, program segment, or part of the code, and the module, program segment, or part of the code contains one or more functions for realizing the specified logical function. Executable instructions. It should also be noted that in some alternative implementations, the functions marked in the block may also occur in a different order than marked in the drawings. For example, two consecutive blocks can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved. It should also be noted that each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart, can be implemented by a dedicated hardware-based system that performs the specified functions or operations Or it can be realized by a combination of dedicated hardware and computer instructions.

The embodiments of the present application have been described above, and the above description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Without departing from the scope and spirit of the illustrated embodiments, many modifications and changes are obvious to those of ordinary skill in the art. The choice of terms used herein is intended to best explain the principles, practical applications, or improvements to technologies in the market of the embodiments, or to enable other ordinary skilled in the art to understand the embodiments disclosed herein.

Claims

A video file processing method, including:

Acquiring video data and audio data of the video file;

Embedding the first watermark information into the video data;

Associatively embedding the second watermark information into the audio data; and

Get the video file with embedded watermark.
The method according to claim 1, wherein associating and embedding the second watermark information into the audio data includes at least one of the following:

The time when the second watermark information is embedded in the audio data is associated with the time when the first watermark information is embedded in the video data; and

The content of the second watermark information embedded in the audio data is associated with the content of the first watermark information embedded in the video data.
The method according to claim 2, wherein embedding the associated second watermark information into the audio data includes at least one of the following:

The start time of embedding the second watermark information into the audio data is the same as the start time of embedding the first watermark information into the video data; and

The content of the second watermark information embedded in the audio data is the same as the content of the first watermark information embedded in the video data.
The method of claim 1, wherein embedding the first watermark information into the video data comprises:

Embed a plurality of first watermark information into the video data at a first predetermined time interval, and

Embedding the associated second watermark information into the audio data includes:

Embed a plurality of second watermark information into the audio data at a second predetermined time interval.
The method according to claim 4, wherein each first watermark information in the plurality of first watermark information comprises:

The first sorted data and the first watermark data, and

Each second watermark information in the plurality of second watermark information includes:

The second ranking data and the second watermark data.
The method of claim 1, wherein embedding the first watermark information into the video data comprises:

Adding the first watermark information to an insignificant area of a video frame in the video data.
The method of claim 1, wherein embedding second watermark information into the audio data comprises:

Adding the second watermark information to the auditory insensitive area of the audio frame in the audio data.
The method of claim 1, wherein embedding the first watermark information into the video data comprises:

The first watermark information is embedded by adjusting the energy relationship between adjacent regions of the video frame transform domain in the video data.
8. The method of claim 8, wherein by adjusting the energy relationship between adjacent regions of the video frame transform domain in the video data, embedding the first watermark information comprises:

Selecting a series of specific video frames in a video sequence, where the video data is a video sequence;

Adjusting the energy relationship between adjacent regions in the transform domain of the series of specific video frames, and embedding the constituent bit information of the first watermark information one by one.
9. The method of claim 9, wherein selecting a series of specific video frames in the video sequence comprises:

Select video key frames in the video sequence.
The method of claim 1, wherein embedding the associated second watermark information into the audio data comprises:

Adjust the energy relationship of adjacent audio frames, and embed the second watermark information.
The method of claim 11, wherein adjusting the energy relationship of adjacent audio frames and embedding the second watermark information comprises:

Adjust the energy relationship of adjacent frequency bands of a series of adjacent audio frames, and embed the constituent bit information of the second watermark information one by one.
The method of claim 1, wherein embedding the first watermark information into the video data comprises:

Selecting the first adding area for adding the first watermark information in the video frame of the video data based on the first password, and/or

Embedding the associated second watermark information into the audio data includes:

The second adding area used for adding the second watermark information in the audio frame of the audio data is selected based on the second password.
A video watermark extraction method, including:

Obtain the watermark-embedded video file according to any one of claims 1-13;

Extracting video data and audio data from the watermark-embedded video file;

Extracting the first watermark information embedded in the video data; and

Extract the second watermark information embedded in the audio data.
The method of claim 14, further comprising:

According to the extracted first watermark information and the second watermark information, an extracted watermark of the video file is generated.
The method according to claim 15, wherein generating the extracted watermark of the video file according to the extracted first watermark information and the second watermark information comprises:

Determining that the first watermark information and the second watermark information include the same watermark data; and

A weighted summation is performed on the watermark data included in each of the first watermark information and the second watermark information to generate an extracted watermark of the video file.
The method of claim 16, wherein, according to the extracted first watermark information and the second watermark information, generating the extracted watermark of the video file further comprises:

According to the degree of confidence, the weight of the watermark data included in the first watermark information and the second watermark information is adjusted.
The method according to claim 16, wherein the first watermark information includes a plurality of sets of watermark data including a first ranking data and a first watermark data, and the second watermark information includes a plurality of sets of watermark data including a second ranking data and a second watermark. Multiple sets of watermark data of the data,

Wherein, extracting the first watermark information embedded in the video data includes:

Based on the extracted first ranking data, determine the subsequent first watermark data, and

Extracting the second watermark information embedded in the video data includes:

Based on the extracted second ranking data, the subsequent second watermark data is determined.
The method of claim 14, wherein extracting the first watermark information embedded in the video data comprises:

Determine the video frame and/or video area in the video data containing the first watermark information, and

Extracting the second watermark information embedded in the audio data includes:

Determine the audio frame and/or audio area in the audio data that contains the second watermark information.
The method of claim 19, wherein extracting the first watermark information embedded in the video data further comprises:

Extracting constituent bit information of the first watermark information that conforms to a predetermined energy relationship from a determined series of video frames and/or video regions;

Combining the extracted constituent bit information into the first watermark information, and

Extracting the second watermark information embedded in the audio data further includes:

Extracting constituent bit information of the second watermark information that conforms to a predetermined energy relationship from a determined series of audio frames and/or audio regions; and

Combining the extracted constituent bit information into the second watermark information.
The method according to claim 20, wherein the video frame and/or video area are determined based on at least one of the following:

The content of the video frame and/or video area;

The selected password of the video frame and/or video area,

And, the audio frame and/or audio area are determined based on at least one of the following:

The frequency spectrum content of the audio frame and/or audio region;

The selected password of the audio frame and/or audio area.
A streaming media watermark extraction method, including:

Acquiring watermark-embedded streaming media data, the streaming media data being generated from the watermark-embedded video file according to any one of claims 1-13;

Extracting video data and audio data from the watermark-embedded streaming media data;

Extracting the first watermark information embedded in the video data; and

Extract the second watermark information embedded in the audio data.
A streaming media data processing method, including:

Acquiring video data and audio data of the streaming media data;

Embedding the first watermark information into the video data;

Associatively embedding the second watermark information into the audio data; and

Get the streaming media data embedded with the watermark.
A video file processing device, including:

A video parsing unit for obtaining video data and audio data in the video file;

A video watermark embedding unit, configured to embed the first watermark information into the video data;

The audio watermark embedding unit is used to associate and embed the second watermark information into the audio data; and

The video mixing unit is configured to mix the video data embedded with the first watermark information and the audio data embedded with the second watermark information to obtain a watermark-embedded video file.
A video watermark extraction device, including:

A video parsing unit, configured to obtain the video data and audio data in the watermarked video file according to any one of claims 1-13;

The video watermark extraction unit is used to extract the first watermark information from the video data; and

The audio watermark extraction unit is used to extract and embed the second watermark information from the audio data.
The device of claim 25, further comprising:

The watermark information generating unit is configured to perform a weighted summation of the watermark data included in each of the first watermark information and the second watermark information to generate the extracted watermark of the video file.
A computing device including:

Processor; and

The memory has executable code stored thereon, and when the executable code is executed by the processor, the processor is caused to execute the method according to any one of claims 1-23.
A non-transitory machine-readable storage medium with executable code stored thereon, and when the executable code is executed by a processor of an electronic device, the processor is caused to execute any one of claims 1-23 The method described.