CN100341330C

CN100341330C - Audio-embedded video frequency in audio-video mixed signal synchronous compression and method of extraction

Info

Publication number: CN100341330C
Application number: CNB2005100165895A
Authority: CN
Inventors: 陈贺新; 赵岩; 齐丽风
Original assignee: Jilin University
Current assignee: Jilin University
Priority date: 2005-02-25
Filing date: 2005-02-25
Publication date: 2007-10-03
Anticipated expiration: 2025-02-25
Also published as: CN1655616A

Abstract

The present invention relates to a technique of signal embedding and extraction, particularly to a method for embedding audio frequency into video in the synchronous compression of audio-video mixed signals and an extraction method thereof. The method comprises the specific steps of 4*4 subblock division, the embedding of digital audio signals into video, the matching degree detection of 4*4 subblock edges and image data recovery. Specific devices comprises a 4*4 subblock division unit, an embedding unit of the audio signals into the video, a matching degree detecting unit of the 4*4 subblock edges and an image data recovery unit. Audio information bits are embedded into a 4*4 subblock of each gray scale frame in the video by the method, more bits are embedded, simultaneously, the influence on the video of the embedded information reaches as least as possible, and nearly undistorted audio data are detected and extracted by using the edge matching so as to provide critical technical support for a synchronous compression system of audio-video mixed signals.

Description

Audio frequency in the audio-video mixed signal compression synchronously embeds video and extracting method thereof

Technical field:

The present invention relates to the embedding and the extractive technique of signal, relate in particular to the audio frequency that is used for audio-video mixed signal compression synchronously and embed video and extractive technique thereof.

Background technology:

At present, the embedding of signal and extractive technique are mainly used in the watermark processing in the information security.Need in the watermark processing watermark information is embedded in the multi-medium data.In the video watermark technology, present several video watermarks embed with extractive technique as follows:

Use the video watermark scheme of Direct swquence spread spectrum model.One section video is made up of some frames, and each frame can be regarded as by several bit-planes (bit planes) and form, and like this, it is the one-dimensional sequence of unit with the bit-plane on time shaft that this section video just can be regarded as.One { 0, the 1}m-sequence acts on this one-dimensional sequence, and most of bit-plane remains unchanged, and has the change of a few bits face can not influence visual effect, and these positions just can be used to watermarked.

Watermarked in motion vector, watermark is embedded in the motion vector that range value is big and phase angle change is little.In the MPEG compression algorithm, the motion compensated prediction technology is used to reduce the temporal redundancy of interframe, and the image that only predicts error just is encoded.In the MPEG video sequence, most frame is the motion compensated predictive coding frame, so, in motion vector, hide the information of watermark information in can the more efficient use video bit stream.Can hide watermark information by the data sequence that trace is revised in the motion vector, watermark detection is very easy to.

In MPEG-1 and MPEG-2 compressing video frequency flow, embed the scheme of visual watermark.People such as Arena have proposed watermark directly is embedded in the MPEG-2 bit basin, thereby have avoided being embedded in the pixel domain watermark necessary with video code flow decoding and the heavy computing of coding again.Watermark only is embedded in the I frame in the video in this scheme, does not revise P frame and B frame, and this is based on and reduces algorithm complex and to the consideration of frame-skip and frame deletion robustness (because the I frame cannot be jumped or delete).Handle according to the MPEG-2 syntactic structure for easier, Scheme Choice macro block rather than pixel be as the operating unit of bit stream, and each bit of watermark information is expanded in the macro block of some.

In the watermarked scheme in DCT territory.Many scholars consider the compatibility with standard such as MPEG, have proposed the technology of embed watermark information on the coefficient after the discrete cosine transform.Consider human-eye visual characteristic, by the ad-hoc location coefficient being made amendment to realize the embedding of watermark.For example, a kind of scheme watermarked in the DCT territory is: (1) represents to be converted to the YUV color mode with each frame I of original video stream V from rgb color, the Y component is carried out dct transform obtain coefficient F={f ₁, f ₂... f _Len, Len=Height*Width wherein, Width is the width of original image, Height is the height of original image.For the ease of arthmetic statement, we get Width and Height equates, and equal 2 ⁿ, n is a natural number; (2) conversion coefficient F is resequenced with Zig-Zag scanning, the DCT coefficient of establishing after the rearrangement is F '; (3) arrange from low to high because of F ' can be similar to regard as by frequency, in order to reach the balance of the watermark robustness and the transparency, we skip preceding L DCT coefficient, begin by formula 1. to add watermark from L+1 coefficient:

F″[I]＝F′[I]+α*|F′[I]|*X[I] ①

Wherein α is the watermark strength parameter, and X satisfies N (0,1) to distribute, by P, and the real number pseudo random sequence that the key that K calculates produces.Under the certain situation of α, in order to improve the robustness of watermark, we limit the size by the watermark information amount that adds in the 1. formula, promptly

α*|F′[I]|*X[I]＞T ②

Wherein T is a given threshold value, if both X[I in the DCT coefficient F ' [I] of pre-treatment and watermark sequence] value satisfies 2. formula, then carries out the watermark embedding; Otherwise, current DCT coefficient F ' [I] and X[I] do not handle, jump to the next one.So repeat, until length is the watermark sequence of M all be embedded in the DCT coefficient till; (4) the DCT coefficient after watermarked being carried out contrary Zig-Zag arranges; (5) the DCT coefficient of arranging through contrary Zig-Zag is carried out inverse dct transform, obtain the Y component after watermarked; At last this frame is returned the RGB pattern from the conversion of YUV color mode, obtain adding watermark rear video frame I '; (6) each frame in this video flowing is all carried out the above watermark operation of adding, obtain adding the video flowing V ' after the watermark.

Summary of the invention:

The object of the present invention is to provide a kind of audio frequency that is used for audio-video mixed signal compression synchronously to embed video and extractive technique thereof.This technology has different purposes with embedding and extractive technique in the watermark processing.In the watermark processing, require the embedding of watermark to have fail safe, robustness, sentience not, characteristics such as anti-aggressiveness, its purpose just can detect watermark and whether exist, and does not require and extract undistorted or near undistorted original watermark information, and needs the common quantity of watermark information bit that embeds seldom.And be used for the audio-video mixed signal audio frequency embedded technology of compression synchronously, do not require to have fail safe, robustness, sentience not, the characteristics of anti-aggressiveness etc. its objective is to extract the undistorted or approaching undistorted audio-frequency information that has embedded, and need the bit of embedding more.Therefore, embedded technology of the present invention is considered its application purpose, adopts the method based on the edge coupling.

Audio frequency in the audio-video mixed signal compression synchronously embeds video and extracting method is: audio signal to be embedded is the original digital audio signal of uncompressed, vision signal is the colorful digital vision signal, every width of cloth chromatic image is made up of red, green, blue three frame gray scale images, and the first step is that audio frequency embeds video:

A. every frame gray scale image is divided into the sub-piece of m * m, m is a positive integer;

B. be that unit carries out the audio frequency embedding by the sub-piece of m * m, capable and left side m row do not embed the audio frequency except that the top m of every frame gray scale image, the bit of digital audio and video signals embeds respectively in red, green, the blue three frame gray scale images in the sub-piece of each m * m that removes the capable and left side m row of m topmost in order, if the bit of current digital audio and video signals is 1, then the gray value of interior all pixels of the corresponding sub-piece of m * m all adds a constant value in the vision signal, if the bit of current audio signals is 0, then the gray value of interior all pixels of the corresponding sub-piece of m * m all remains unchanged in the vision signal;

C. embed audio-video mixed signal behind the audio frequency as stated above through video-frequency compression method compressed encoding based on the four-matrix discrete cosine transform;

Second step was finished the signal extraction of audio-video mixed signal sound intermediate frequency by following steps:

A. decoded audio-video mixed signal, red with every frame, green, blue three monochrome frame data, the sub-piece of m * m that capable and left side m is listed as except that m topmost, detect the sub-piece of each former m * m and above it and the edge matching degree of left side adjacent sub-blocks, detect again deduct the sub-piece of m * m of constant value with above it with the edge matching degree of left side adjacent sub-blocks, all pixels that the said sub-piece of m * m that deducts constant value is the sub-piece of former m * m all deduct one with the identical constant value that when audio frequency embeds, adds after the sub-piece of m * m, if the sub-piece of m * m that does not deduct constant value than the sub-block edge of the m * m that deducts constant value mate good, then the audio bit of Ti Quing is 0, otherwise the audio bit that extracts is 1;

B. if the audio bit that extracts is 1, the sub-piece of the m * m that deducts constant value is replaced the sub-piece of former m * m, the sub-piece of m * m that then deducts constant value is the recovery pictorial data, if the audio bit that extracts is 0, then the sub-piece of this m * m is the recovery pictorial data.

The present invention is described further below in conjunction with the accompanying drawing illustrated embodiment.

Description of drawings:

The digital audio and video signals of Fig. 1, indication of the present invention embeds the flow chart of digital video;

The flow chart of the audio-video mixed signal sound intermediate frequency signal extraction of Fig. 2, indication of the present invention;

Fig. 3, sound intermediate frequency signal of the present invention embed the video unit schematic diagram;

4 * 4 sub-block edge matching degree detecting unit schematic diagrames among Fig. 4, the present invention;

Embodiment: core content of the present invention is that the audio frequency that is used for audio-video mixed signal compression synchronously embeds video and extractive technique thereof.In existing embedding and extractive technique, it is mainly used is watermark processing in the information security, and the information of required embedding is less, and the existence that its purpose just detects watermark whether, and the embedding of its specification requirement information has fail safe, robustness, sentience not, characteristics such as anti-aggressiveness.Embedding of the present invention and extractive technique be in order to be applied in the synchronous compressibility of audio-video mixed signal, and therefore, embedded information bit is many and to require near extract embedded information bit undistortedly be 0 or 1.

The influence of video is reached minimum simultaneously in order to embed more bit number, the present invention is that example is described as follows with employing with the method that the audio-frequency information bit embeds each gray scale frame 4 * 4 sub-piece as far as possible.

In the foregoing invention content, the concrete grammar that digital audio and video signals embeds video step (unit) is: respectively 4 * 4 sub-pieces that remove 4 row topmost and the left side 4 row are not the embedded audio signal, if the bit of current digital audio and video signals is 1, then the gray value of interior all pixels of corresponding 4 * 4 sub-pieces all adds a constant (as: 20) in the digital video signal; If the bit of current digital audio and video signals is 0, then the gray value of interior all pixels of corresponding 4 * 4 sub-pieces all remains unchanged in the digital video signal.If P (i, j) for not embedding 4 * 4 sub-piece pictorial data of audio bit, P ' (i, j) for having embedded 4 * 4 sub-piece pictorial data of audio bit, i=0 wherein, 1,2,3; J=0,1,2,3, its telescopiny can be formulated as:

P′(i，j)＝P(i，j)+CX

Wherein: C is constant (as getting 10 or 20), the audio bit of X for embedding.

Detecting steps (unit) in 4 * 4 sub-block edge matching degrees can utilize the scope of pictorial data to be limited to characteristics between [0-255], adopt the detection method of simplifying: to decoded audio-video mixed signal, red with every frame, green, blue three monochrome frame data, except that 4 * 4 sub-pieces of 4 row topmost and the left side 4 row, detecting each 4 * 4 sub-piece and all pixels that will this sub-piece, all to deduct a constant value (the same, as: the data area of 4 * 4 sub-pieces 20), if the data area of 4 * 4 sub-pieces exceeds [0-255], then the audio bit of Ti Quing is 1, needn't carry out the edge matching detection; Exceed [0-255] if deduct the data area of 4 * 4 sub-pieces behind the constant value, then the audio bit of Ti Quing is 0, also needn't carry out the edge matching detection.

Concrete implementation step is:

4 * 4 sub-piece segmentation procedure: with red, green, blue three monochrome frame image division of each frame of color digital video is 4 * 4 sub-pieces;

Digital audio and video signals embeds the video step: respectively 4 * 4 sub-pieces that remove 4 row topmost and the left side 4 row are not the embedded audio signal, if the bit of current digital audio and video signals is 1, then the gray value of interior all pixels of corresponding 4 * 4 sub-pieces all adds a constant value (as: 20) in the digital video signal; If the bit of current digital audio and video signals is 0, then the gray value of interior all pixels of corresponding 4 * 4 sub-pieces all remains unchanged in the digital video signal.

4 * 4 sub-block edge matching degrees detect step: to decoded audio-video mixed signal, red, green, blue three monochrome frame data with every frame, except that 4 * 4 sub-pieces of 4 row topmost and the left side 4 row, detecting each 4 * 4 sub-piece and all pixels that will this sub-piece, all to deduct a constant value (the same, as: 4 * 4 sub-pieces 20) and above it and the edge matching degree of left side adjacent sub-blocks, if 4 * 4 sub-pieces that do not deduct constant value than 4 * 4 sub-block edges that deduct constant value mate good, then the audio bit of Ti Quing is 0, otherwise the audio bit that extracts is 1; The pictorial data recovering step: if the audio bit that extracts is 1,4 * 4 sub-pieces that then will deduct constant value are replaced former 4 * 4 sub-pieces, and to obtain the recovery pictorial data of this sub-piece, if the audio bit that extracts is 0, then this 4 * 4 sub-piece is the recovery pictorial data.

Claims

1. the audio frequency in the audio-video mixed signal compression synchronously embeds video and extracting method thereof, it is characterized in that: audio signal to be embedded is the original digital audio signal of uncompressed, vision signal is the colorful digital vision signal, every width of cloth chromatic image is made up of red, green, blue three frame gray scale images, and the first step is that audio frequency embeds video:

A. decoded audio-video mixed signal, red with every frame, green, blue three monochrome frame data, the sub-piece of m * m that capable and left side m is listed as except that m topmost, detect the sub-piece of each former m * m and above it and the edge matching degree of left side adjacent sub-blocks, detect again deduct the sub-piece of m * m of constant value with above it with the edge matching degree of left side adjacent sub-blocks, all pixels that the said sub-piece of m * m that deducts constant value is the sub-piece of former m * m all deduct one with the identical constant value that when audio frequency embeds, adds after the sub-piece of m * m, if the sub-piece of former m * m is better than the sub-block edge coupling of the m * m that deducts constant value, then the audio bit of Ti Quing is 0, otherwise the audio bit that extracts is 1;

B. if the audio bit that extracts is 1, the sub-piece of the m * m that deducts constant value is replaced the sub-piece of former m * m, the sub-piece of m * m that then deducts constant value is the recovery pictorial data, if the audio bit that extracts is 0, the sub-piece of then former m * m is the recovery pictorial data.