CN106409302A

CN106409302A - Audio frequency watermark method and system based on embedding area selection

Info

Publication number: CN106409302A
Application number: CN201610458412.9A
Authority: CN
Inventors: 陈怡�; 高戈; 张康; 吕冰; 刘影
Original assignee: Huazhong Normal University
Current assignee: Huazhong Normal University
Priority date: 2016-06-22
Filing date: 2016-06-22
Publication date: 2017-02-15
Anticipated expiration: 2036-06-22
Also published as: CN106409302B

Abstract

The present invention provides an audio frequency watermark method and system based on embedding area selection. The embedding process comprises: reading an audio frequency file, determining whether each frame signal can be taken as an embedding area or not, and then performing selection of the embedding frequency bands of the audio frequency watermark; and performing the discrete Fourier transform, generating a binary pseudo-random spread spectrum sequence, performing watermark embedding, and converting the binary pseudo-random spread spectrum sequence to a time domain. The detection process includes: reading an audio file to be detected, determining whether each frame signal can be taken as an embedding area or not, calculating the starting point and the frequency domain ending point of a detection range, performing the discrete Fourier transform to generate a binary pseudo-random spread spectrum sequence, calculating the detected sufficient statistics, and obtaining the detected watermark bits.

Description

Based on embedded regioselective audio-frequency water mark method and system

Technical field

The present invention relates to Digital Audio Watermarking Techniques field, more particularly, to it is based on and embeds regioselective audio-frequency water mark method and system.

Background technology

Digital audio frequency watermark is to add some digital informations in audio signal to reach file real and fake discrimination, copyright protection, information The signal processing operations of the purpose such as hiding.The selection technique that audio frequency watermark embeds region refers to before watermark is embedded into audio signal, Appropriate audio region is selected to embed watermark.Conventional audio digital watermark, does not account for the feature of audio signal, to whole audio frequency File all carries out the embedded of watermark, so can lead to 1) after the low region of audio frequency signal amplitude embeds watermark, amplitude is beyond covering Cover threshold value and produce noise, destroy the perception transparency；2) for the transient signal that appearance change in audio signal is violent, this region Audio signal variance very big, lead to after embedded watermark detect watermark when the watermark bit error rate very high；3) embed watermark in frequency domain, If selecting the inapparent region of auditory perceptual to embed watermark, after signal processing or audio frequency lossy compression method, watermark will be lost Lose a part, lead to the watermark detection bit error rate high.

Content of the invention

It is an object of the invention to provide the audio watermarking technique that selection region embeds, watermark is enable to be embedded into suitable audio region In, it is to avoid the generation unnecessary noise occurring and reducing error code.

For reaching above-mentioned purpose, the technical scheme that the present invention provides provides a kind of being based on to embed regioselective audio-frequency water mark method, Including telescopiny and detection process,

Described telescopiny comprises the following steps,

Step A1, reads audio file, obtains the signal x of n-th frame time-domain audio after sample rate f s1 and framing_n, frame length is N,

First to every frame signal x_nBe made whether can as the judgement in embedded region,

Then being directed to can be used as each frame signal x in embedded region_n, carry out the selection of the embedded frequency band of audio frequency watermark, carry out sound The selection of the embedded frequency band of frequency watermark, if according to the sensitive default embedded starts frequency of frequency-portions of auditory perceptual be FWMIN, end frequency are FWMAX, a frame start embedded point freqmin1 and embedded end point freqmax1 ask for as Under,

Freqmin1=floor ((FWMIN × 2.0/fs1) × N)

Freqmax1=floor ((FWMAX × 2.0/fs1) × N)

Wherein, floor is downward bracket function；

Step A2, to each frame signal x that can embed watermark_n, carry out discrete Fourier transform (DFT) and obtain frequency domain signal X_n；

Step A3, by the use of key key as random number seed, generates the binary system that length is freqmax1-freqmin1+1 pseudo- Random frequency expansion sequence u；

Step A4, according to frequency expansion sequence u, frequency domain signal X_nWith watermark bit b, carry out the embedded of watermark, obtain embedded watermark Frequency-region signal afterwards, is calculated as follows,

|X′_n|=| X_n|+bαu

Wherein, α is constant, controls the embedment strength of watermark, | X_n| and | X '_n| represent respectively the frequency domain amplitude before embedded watermark and Frequency domain amplitude after embedded watermark, the then frequency-region signal after Euler's formula obtains embedded watermark

Wherein, ∠ X_nRepresent the phase place of frequency-region signal, X '_nRepresent the frequency-region signal after embedded watermark, e is mathematics natural Exponents；

Step A5, by the frequency domain signal X ' after embedded watermark_nTransform to time domain, generate the audio file of embedded watermark；

Described detection process comprises the following steps,

Step B1, reads audio file to be detected, the n-th frame signal z after the time domain framing obtaining_nWith sample rate f s2,

First to every frame signal x_nBeing made whether can be used as the judgement in embedded region；

For can be used as each frame signal x in embedded region_n, as signal to be detected, calculate the starting point of detection range Freqmin2 and frequency domain end point freqmax2

Freqmin2=floor ((FWMIN × 2.0/fs2) × N)

Freqmax2=floor ((FWMAX × 2.0/fs2) × N)

Step B2, carries out the frequency-region signal Z that discrete Fourier transform (DFT) obtains signal to be detected_n, corresponding frequency domain range value is designated as |Z_n|；

Step B3, by the use of key key as random number seed, generate the binary system that length is freqmax2-freqmin2+1 pseudo- with The frequency expansion sequence u of machine；

Step B4, according to the frequency domain range value of frequency expansion sequence u and signal to be detected | Z_n|, calculate the sufficient statistic r of detection_nAs Under,

If sufficient statistic r_n>=0, then the watermark bit detecting is b=1；Otherwise, the watermark bit detecting is b=0.

And, in step A1 and step B1, to every frame signal x_nBeing made whether can be used as the judgement in embedded region, realization side Formula is as follows,

1) signal x_nAverage energySize exceed default respective threshold τ₁, it is to be then quiet area, do not allow embedded watermark；

2) if signal x_nInside comprise transient signal, then do not allow embedded watermark.

And, signal x_nInside whether comprise transient signal, judge in the following manner,

If a frame signal is decomposed into S block, calculate the energy of S block respectively, compare block and the least energy of ceiling capacity Energy ratio rate of block and default respective threshold τ₂If rate is more than τ₂Then think that this frame signal comprises transient signal.

The present invention correspondingly provide a kind of based on embedding regioselective audio frequency watermark system, include audio frequency watermark embed subsystem with Watermark detection subsystem,

Described audio frequency watermark embeds subsystem and includes with lower module,

Select appropriate area to embed module, for reading audio file, obtain n-th frame time-domain audio after sample rate f s1 and framing Signal x_n, frame length is N,

Freqmin1=floor ((FWMIN × 2.0/fs1) × N)

Freqmax1=floor ((FWMAX × 2.0/fs1) × N)

Wherein, floor is downward bracket function；

First time-frequency convert module, for each frame signal x that can embed watermark_n, carry out discrete Fourier transform (DFT) and obtain frequency domain Signal X_n；

First frequency expansion sequence generation module, for by the use of key key as random number seed, generating length be The binary system pseudorandom frequency expansion sequence u of freqmax1-freqmin1+1；

Watermark embedding module, for according to frequency expansion sequence u, frequency domain signal X_nWith watermark bit b, carry out the embedded of watermark, obtain Frequency-region signal to after embedded watermark, is calculated as follows,

|X′_n|=| X_n|+bαu

Time-frequency inverse transform module, for by the frequency domain signal X ' after embedded watermark_nTransform to time domain, generate the audio frequency literary composition of embedded watermark Part；

Described watermark detection subsystem includes with lower module,

Select appropriate area detection module, the n-th frame signal z for reading audio file to be detected, after the time domain framing obtaining_n With sample rate f s2,

Freqmin2=floor ((FWMIN × 2.0/fs2) × N)

Freqmax2=floor ((FWMAX × 2.0/fs2) × N)

Second time-frequency convert module, obtains the frequency-region signal Z of signal to be detected for carrying out discrete Fourier transform (DFT)_n, respective tones Domain range value is designated as | Z_n|；

Second frequency expansion sequence generation module, for by the use of key key as random number seed, generating length be The binary system pseudorandom frequency expansion sequence u of freqmax2-freqmin2+1；

Coherent detection module, for the frequency domain range value according to frequency expansion sequence u and signal to be detected | Z_n|, calculate filling of detection Divide statistic r_nIt is as follows,

And, select appropriate area to embed module and select in appropriate area detection module, to every frame signal x_nBeing made whether can As the judgement in embedded region, implementation is as follows,

The present invention proposes the accuracy rate lifting watermark detection by frame in ceiling capacity and least energy than to filter transient signal, Lift the robustness of watermark by embedding a watermark in the significant frequency range of auditory perceptual, further, propose to utilize average energy To filter the No Tooting Area lifting perception transparency.Technical solution of the present invention has important market value.

Brief description

Fig. 1 is the embedded subsystem structure block diagram of the embodiment of the present invention.

Fig. 2 is the detection subsystem structure block diagram of the embodiment of the present invention.

Fig. 3 is the telescopiny flow chart of the embodiment of the present invention

Fig. 4 is the detection process flow chart of the embodiment of the present invention.

Specific embodiment

Combine accompanying drawing with specific embodiment below technical scheme is described further.

The embodiment of the present invention provide a kind of based on embedding regioselective audio frequency watermark system, include audio frequency watermark embed subsystem with Watermark detection subsystem.

Referring to Fig. 1, embedded regioselective audio watermarking technique provided in an embodiment of the present invention embeds subsystem, closes including selection Suitable region embeds module 1, the first time-frequency convert module 2, the first frequency expansion sequence generation module 3, watermark embedding module 4 and time-frequency Inverse transform module 5, can realize each module using software firming bechnology when being embodied as.

Described selection appropriate area embeds module 1, and the time-domain audio signal frame reading is judged, can be by when being embodied as Frame judges whether to disclosure satisfy that the condition of embedded watermark：It is unsatisfactory for just skipping this frame, continue the judgement of next frame；If meeting Signal output is given the first time-frequency conversion module 2, the sample rate according to the time-domain audio signal reading and human ear more sensitivity Frequency range calculates the scope that this frequency-region signal embeds watermark, and the frequency-region signal that can embed in scope output feedwater print is embedded mould Block 4, the maximum of this embedded scope and minima are exported to the first frequency expansion sequence generation module 3；

Described first time-frequency convert module 2, for being converted to frequency-region signal, output feedwater print by the time-domain audio signal reading Embedded module 4；

Described first frequency expansion sequence generation module 3, for embedding module 1 input according to random number seed and selection appropriate area The maximum of embedded scope and minima generate and embed the amplitude with length for the scope is 1 or -1 equally distributed random sequences, and This random sequence is exported to watermark embedding module 4；

Described watermark embedding module 4, for the amplitude spectrum in frequency-region signal, generates the audio signal with watermark information of frequency domain Export to time-frequency inverse transform module 5；

Described time-frequency inverse transform module 5, for embedding a watermark into the audio signal with watermark information of the frequency domain of module 4 input Be converted to the audio signal with watermark information of time domain, and the audio signal with watermark information for this time domain is generated audio frequency literary composition Part, just obtains the audio file with watermark information.

Referring to Fig. 2, the adaptive audio watermark detection subsystem based on phase code provided in an embodiment of the present invention, including selection Appropriate area detection module 6, the second time-frequency convert module 7, the second frequency expansion sequence generation module 8, coherent detection module 9, tool Body can realize each module using software firming bechnology when implementing.

Described selection appropriate area detection module 6 is essentially identical with the function of selecting appropriate area to embed module 1, is unsatisfactory for watermark The region of embedded condition, does not typically contain watermark yet, can be without consideration during detection：Can judge frame by frame when being embodied as, right In the frame being unsatisfactory for testing conditions, skip and do not detect, continue the judgement of next frame；Meet testing conditions audio signal export to Second time-frequency conversion module 7, equally exports the maxima and minima in frequency detecting region to the second time-frequency convert module 7 He Second frequency expansion sequence generation module 8；

Described second time-frequency convert module 7, for the time-domain audio signal reading is converted to frequency-region signal, exports to correlation Detection module 9；

Described second frequency expansion sequence generation module 8 is essentially identical with the function of the first frequency expansion sequence generation module 3, the knot that will produce Fruit exports to coherent detection module 9；

Described coherent detection module 9, for giving birth to the frequency domain amplitude signal to be detected of input and frequency expansion sequence according to detection range Become the frequency expansion sequence of module 9 input, calculate correlation, according to the symbol of correlation, judge watermark.

Each module implements referring to method corresponding steps, and it will not go into details for the present invention.Provided in an embodiment of the present invention based on embedded area The audio-frequency water mark method that domain selects, including telescopiny and detection process.

Referring to Fig. 3, the audio frequency watermark telescopiny based on selection region provided in an embodiment of the present invention can adopt computer software Technological means carry out flow process automatically, specifically include following steps：

Step A1, reads audio file, the audio signal x elder generation framing to time domain, obtains n-th after sample rate f s1 and framing Frame time-domain audio signal x_n(frame length is N), to every frame signal x_nIt is made whether to judge bag as the judgement in embedded region Judge containing both sides：

1) judge x_nAverage energy size whether beyond the threshold value setting, to judge present frame x_nWhether it is quiet area, such as Fruit is that quiet area does not allow for embedded watermark, is not otherwise just quiet area beyond threshold value, may be embedded.By following public affairs Formula calculates the average energy of n-th frame

Wherein, N is frame length, i.e. the sample points of a frame in；I is the sample point index number of a frame in, and value arrives N-1 0 Between；x_n ²I () represents n-th frame time-domain signal x_nIn i-th point of energy of frame in；τ₁For the decision threshold of average energy, specifically reality When applying, those skilled in the art voluntarily can preset value, for example, be empirically derived；If exceeding threshold value, meet condition 1), Carry out following condition 2) judgement.

When 2) transient signal for a frame in, due to its frequency acute variation, the larger variance that can cause, in inspection During survey, the error probability of the watermark detection that signal variance causes more greatly is higher, and this situation should not embed watermark yet.By by one Frame is decomposed into S block, calculates the energy of S block respectively, by energy ratio rate of the block of ceiling capacity and least energy block and Threshold tau₂Comparison, rate be more than τ₂Then it is considered that this frame signal comprises the not embedded watermark of transient signal, otherwise can embed water Print.When being embodied as, those skilled in the art can voluntarily preset the value of S.

Specific implementation is as follows：

First by frame signal x_nIt is divided into S block, then sample points M in each sub-block are

M=N/S (2)

The ENERGY E of each block_iIt is calculated as follows

Wherein, i represents the index number of intra block, and j represents the index number of frame in sample point, x_n ²J () represents n-th frame time domain Signal x_nEnergy in frame in jth point.

Find out the ceiling capacity E in block energy_MaxWith least energy E_Min

E_Max=MAX { E_i, E_Min=MIN { E_i, i ∈ [0, S-1] (4)

Wherein, MAX, MIN represent maximizing function and minimum value function respectively.

The ratio rate of ceiling capacity and least energy is calculated as follows：

If rate is ＞ τ₂, it is considered as signal frame x_nInside there is transient signal, this frame does not embed watermark；Otherwise, water can be embedded Print.Wherein τ₂For threshold value, when being embodied as, those skilled in the art can voluntarily preset value, such as τ₂Detection for transient signal Threshold value, is empirically derived.

Then being directed to can be used as each frame signal x in embedded region_n, for the selection of the embedded frequency band of audio frequency watermark, Ying Weiren The more significant region of ear perception, those skilled in the art voluntarily can preset according to auditory perceptual characteristic, for example 1000-7000Hz.Because the signal in these regions is after filtering, after audio compression etc. attacks, will not be removed.So by water Print is embedded into the obvious region of perception, is standing will not to be erased after some signals are attacked, is being capable of detecting when.If setting according to people The sensitive default embedded starts frequency of frequency-portions of ear perception is FWMIN, end frequency is FWMAX, a corresponding frame Start embedded point freqmin1 and embedded end point freqmax1 ask for as follows,

Freqmin1=floor ((FWMIN × 2.0/fs1) × N) (6)

Freqmax1=floor ((FWMAX × 2.0/fs1) × N) (7)

Wherein, floor is downward bracket function.

According to starting embedded point freqmin1 and embedded end point freqmax1, choose the frequency-domain audio signals in the range of this.

Can judge frame by frame when being embodied as, be unsatisfactory for skipping of condition, carry out the judgement of next frame.

Step A2, to the signal frame x that can embed watermark_n, carrying out FFT (fast discrete Fourier conversion) is frequency domain Signal X_n.

Step A3, by the use of key key as random number seed, generates the binary system that length is freqmax1-freqmin1+1 pseudo- Random frequency expansion sequence u.

Embodiment detailed process in MATLAB is as follows：

First, using key key, call RandStream function (random seed function) to rand function (generating random number Function) initialized, then call rand function to generate random number, because the random number that rand function generates is between 0～1 Number, also need to carry out, to these numbers, the binary pseudo-random sequence becoming 0 and 1 that rounds up, then by this unipolar pseudo- with Machine sequence, switchs to pseudo-random sequence u that bipolarity comprises only+1 and -1.

Step A4, according to frequency expansion sequence u, frequency domain signal X_nWith watermark bit b, carry out watermark using equation below (8) Embedded, obtain the frequency-region signal after embedded watermark, calculate realize as follows

|X′_n|=| X_n|+bαu (8)

Wherein, α is constant, controls the embedment strength of watermark, those skilled in the art's predeterminable value when being embodied as；|X_n| and |X′_n| represent the frequency domain amplitude before embedded watermark and the frequency domain amplitude after embedded watermark respectively, then embedded by Euler's formula Frequency-region signal after watermark.

Wherein, ∠ X_nRepresent the phase place of frequency-region signal, X '_nRepresent the frequency-region signal after embedded watermark, e is mathematics natural Exponents.

Step A5, by the frequency domain signal X ' after embedded watermark_nTransform to time domain, ultimately produce audio file, that is, obtain embedded water The audio file of print.

Referring to Fig. 4, the audio frequency watermark detection process embedding based on selection region provided in an embodiment of the present invention, computer can be adopted Software engineering means carry out flow process automatically, specifically include following steps：

Step B1, reads audio file to be detected, the n-th frame signal z after the time domain framing obtaining_nWith sample rate f s2, right Each time-domain signal z_nTake steps the same decision method in A1,

Consider following two conditions,

It is not then quiet area and the frame signal not comprising transient signal, watermark can be embedded and have to be detected.

For can be used as each frame signal x in embedded region_n, as signal to be detected, calculate the frequency domain starting point of detection range Freqmin2 and frequency domain end point freqmax2

Freqmin2=floor ((FWMIN × 2.0/fs2) × N) (10)

Freqmax2=floor ((FWMAX × 2.0/fs2) × N) (11)

Step B2, for the signal z meeting testing conditions_n, carry out the frequency domain letter that discrete Fourier transform (DFT) obtains signal to be detected Number Z_n, corresponding frequency domain range value is designated as | Z_n|.

Step B3, using key key, generates binary system frequency expansion sequence u (identical with the u mode that embedding grammar above obtains), I.e. by the use of key key as random number seed, generate the binary system pseudorandom spreading sequence that length is freqmax2-freqmin2+1 u.

Step B4, according to the frequency domain range value of frequency expansion sequence u and signal to be detected | Z_n|, by calculating frequency expansion sequence u and to be checked Survey the frequency domain range value of signal | Z_n| correlation, calculate the sufficient statistic r of detection_n

Wherein,<·>Represent that the inner product of signal calculates.

Specific embodiment described in the present invention is only explanation for example to present invention spirit.The technical field of the invention Technical staff can be made various modifications or supplement or substituted using similar mode to described specific embodiment, but simultaneously Do not deviate by the spirit of the present invention or surmount scope defined in appended claims.

Claims

1. a kind of based on embedding regioselective audio-frequency water mark method it is characterised in that：Including telescopiny and detection process, Described telescopiny comprises the following steps,

Step A1, reads audio file, obtains the signal x of n-th frame time-domain audio after sample rate f s1 and framing_n, frame length is N, First to every frame signal x_nBe made whether can as the judgement in embedded region,

Freqmin1=floor ((FWMIN × 2.0/fs1) × N)

Freqmax1=floor ((FWMAX × 2.0/fs1) × N)

Wherein, floor is downward bracket function；

|X′_n|=| X_n|+bαu

X_{n}^{'} = | X_{n}^{'} | e^{j &angle; X_{n}}

Described detection process comprises the following steps,

Freqmin2=floor ((FWMIN × 2.0/fs2) × N)

Freqmax2=floor ((FWMAX × 2.0/fs2) × N)

r_{n} = \frac{< u, | Z_{n} | >}{< u, u >}

2. according to claim 1 be based on embed regioselective audio-frequency water mark method it is characterised in that：Step A1 and step B1 In, to every frame signal x_nBeing made whether can be as follows as the judgement in embedded region, implementation,

3. according to claim 2 be based on embed regioselective audio-frequency water mark method it is characterised in that：Signal x_nInside whether comprise Transient signal, judges in the following manner,

4. a kind of based on embedding regioselective audio frequency watermark system it is characterised in that：Embed subsystem and watermark inspection including audio frequency watermark Survey subsystem,

Freqmin1=floor ((FWMIN × 2.0/fs1) × N)

Freqmax1=floor ((FWMAX × 2.0/fs1) × N)

Wherein, floor is downward bracket function；

|X′_n|=| X_n|+bαu

X_{n}^{'} = | X_{n}^{'} | e^{j &angle; X_{n}}

Described watermark detection subsystem includes with lower module,

Freqmin2=floor ((FWMIN × 2.0/fs2) × N)

Freqmax2=floor ((FWMAX × 2.0/fs2) × N)

r_{n} = \frac{< u, | Z_{n} | >}{< u, u >}

5. according to claim 4 be based on embed regioselective audio frequency watermark system it is characterised in that：Appropriate area is selected to embed In module and selection appropriate area detection module, to every frame signal x_nBeing made whether can be used as the judgement in embedded region, realization side Formula is as follows,

6. according to claim 5 be based on embed regioselective audio frequency watermark system it is characterised in that：Signal x_nInside whether comprise Transient signal, judges in the following manner,