CN102543086A

CN102543086A - Device and method for expanding speech bandwidth based on audio watermarking

Info

Publication number: CN102543086A
Application number: CN2011104223927A
Authority: CN
Inventors: 陈喆; 殷福亮; 赵承勇
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2011-12-16
Filing date: 2011-12-16
Publication date: 2012-07-04
Anticipated expiration: 2031-12-16
Also published as: CN102543086B

Abstract

The invention disclosed a device and a method for expanding speech bandwidth based on an audio watermarking. The device and the method are as follows: in the starting part, the speech sent by a person is a bandwidth signal; before the speech is transmitted by a telephone line, high-frequency parameters are embedded to a narrow band code stream; the narrow band speech signal is transmitted by the telephone line; A-law decoding is performed on a receiving end, and then the high-frequency parameters are extracted; the high-frequency part in the bandwidth speech is recovered by the high-frequency parameters; finally, the high-frequency speech and low-frequency speech are synthesized to be a bandwidth speech. The device and the method use characteristics of the audio watermarking to build a hidden channel in the narrow band speech and uses the channel to transmit the parameters of the high-frequency speech to achieve band extension of the speech signal without changing the original network protocol.

Description

A kind of apparatus and method of expanding based on the speech bandwidth of audio frequency watermark

Technical field

The present invention relates to voice processing technology, particularly a kind of apparatus and method of expanding based on the speech bandwidth of audio frequency watermark.

Background technology

The main concentration of energy of human speech signal is in 0.3～3.4KHz, and the 4KHz bandwidth just can guarantee enough intelligibilitys.Therefore, public telephone network (PSTN) coding standard that International Telecommunication Union formulates the G.711 SF of (being A rule and μ rule) is 8KHz, and uses till today always.

Narrowband speech has reduced the demand to communication bandwidth when guaranteeing certain intelligibility, but this is to be cost with the naturality of sacrificing voice.Narrowband speech has been lost the high fdrequency component in the raw tone, so it sounds nature inadequately.In order to improve voice quality, G.722 ITU-T has proposed first broadband voice codec that is used for the remote phone meeting.Broadband voice communications can realize through designing transmission link again, but for huge PSTN fixed telephone network, it is expensive excessive to design transmission link again.

Traditional watermark is meant being seen mark when paper faces toward light, and the true and false that generally is used for important bill detects.And digital watermark technology is to utilize ubiquitous redundancy of multimedia digital works and randomness, is embedded into some numerical information in the copyright, realizes the hiding transmission of information.Digital watermarking is mainly used in the copyright and the integrality of protection copyright.Because people's the sense of hearing is than quick-eyed, it is more than what to be embedded into image difficult that watermark is embedded into audio frequency.

Audio frequency watermark based on least significant bit (LSB) (LSB): based on the method for the speech bandwidth of LSB expansion is that the lowest order that high-frequency parameter is embedded into encoding code stream is realized that the quantity of this method embed watermark is many, algorithm is simple, the communication channel that the suitable bit error rate is lower.

Audio frequency watermark based on time domain echo concealing technology: the audio frequency watermark based on time domain echo concealing technology is to have utilized the time domain masking effect in the human hearing characteristic: though a voice signal finishes, it is also influential to the hearing ability of another sound.The watermark negligible amounts that this method embeds, embed watermark have certain influence to original sound later.

This method of audio frequency watermark based on the frequency domain discrete Fourier transformation is at first carried out the DFT conversion to audio-frequency information; Selecting frequency range wherein then is that the DFT coefficient of 2.4～6.4kHz carries out watermark and embeds, and replaces corresponding D FT coefficient with the spectrum component of representing watermark sequence.Though it is this method has good robustness, when embed watermark and original DFT coefficient difference are excessive, bigger to the influence of raw tone.

Audio frequency watermark based on the frequency domain discrete cosine transform: this method is done discrete cosine transform to time-domain signal earlier, then sequence is revised discrete cosine transform (MDCT), changes with embed watermark through the coefficient to MDCT.This method has good robustness, but the negligible amounts of embed watermark.

The shortcoming of prior art: above method can not be accomplished good equilibrium aspect three of robustness, disguise and embed watermark quantity, its shortcoming is separately all arranged, and therefore can not be used for the speech bandwidth expansion preferably.

Summary of the invention

Realize the various shortcoming and defect that bandwidth is expanded to existing audio frequency watermark, the invention provides a kind of apparatus and method of expanding based on the speech bandwidth of audio frequency watermark.

In order to achieve the above object, a kind of method of expanding based on the speech bandwidth of audio frequency watermark provided by the invention may further comprise the steps:

Steps A. use QMF analysis filter pack module broadband voice to be divided into the high fdrequency component of narrowband speech and the 8000～16000Hz of two part: 0～8000Hz; And export the signal sampling frequencies with two and reduce to 8KHz, obtain low frequency signal s _L( n) and high-frequency signal s _H( n).

Step B. extracts 30 high-frequency parameters through extracting the high-frequency parameter module: 16 temporal envelope parameters, 12 frequency domain envelope parameters, average time domain envelope parameters and average frequency domain envelope parameters; This partial reference the way of document " based on the DTX/CNG algorithm research and the realization of layering broadband voice coding/decoding system ", below be the concrete method for distilling of each parameter:

Step B1. extracts 16 temporal envelope parameters and average temporal envelope parameter:

The high fdrequency component of every 20ms s _H( n) be divided into 16 sections, every section comprises 10 sampled points; 16 temporal envelope parameters are:

Figure 2011104223927100002DEST_PATH_IMAGE001

。

Calculate average temporal envelope:

Figure 2011104223927100002DEST_PATH_IMAGE003

Use the time domain envelope parameters T( i) and mean value

Make difference and carry out normalization:

Figure 2011104223927100002DEST_PATH_IMAGE005

。

Step B2. extracts 12 frequency domain envelope parameters and average frequency domain envelope parameters:

High fdrequency component s _H( n) 160 sampled points and last 48 of previous frame of present frame adopt point to get through a windowing process

, use long 208 the sampling point window functions of window here Window( n):

Figure 2011104223927100002DEST_PATH_IMAGE007

Wherein, N=208;

Signal after the windowing is mended 0 to 256 point, and the FFT conversion of doing then at 256 gets S _F( k):

Figure 2011104223927100002DEST_PATH_IMAGE009

。

Wherein, L=256; Frequency domain is divided into 12 evenly at interval, calculates each frequency domain envelope parameters at interval, and convert logarithm weighting sub belt energy parameter to.

Calculate the average frequency domain envelope:

。

With the frequency domain envelope parameters F( i) and mean value

Make difference and carry out normalization:

Figure 2011104223927100002DEST_PATH_IMAGE013

。

Step C. through coding/decoding module G.711 with narrow band voice signal s _L( n) through A-law encoding device coding, obtain each point 8 BitThe code stream of data length is embedded into watermark information in the code stream, is sent in the network through telephone wire; Receiving end extracts watermark information from code stream, and through A rule decoder decode, obtains narrow band voice signal.

Step D. is embedded into watermark through the watermark merge module and comprises following dual mode in the code stream:

D1. be embedded into watermark in the code stream uniformly through the watermark merge module: because a frame signal has 160 sampled points, and the bit number of embed watermark is 66bit, whenever embeds 1 bit information at a distance from a sampled point.

Perhaps D2. selectively is embedded into watermark information in the little sample point of amplitude through the watermark merge module; Use C0～C7 to represent the lowest order of encoding code stream to arrive most significant digit; According to agreement G.711, most significant digit C7 represents the sign bit of sampled point, and C6～C4 is the paragraph sign indicating number, C3～C0 section of being ISN; The paragraph sign indicating number is more little, and the amplitude of the sampled value of code stream representative is more little; This method uses the C6 position that division of signal is large-signal, i.e. C6=1 and small-signal, i.e. C6=0, embed watermark when C6 is 0; If not enough 66 of the position that a frame embeds then is chosen in other position embed watermarks.

Step e. corresponding through extracting watermark module extraction watermark with step D, comprise dual mode:

E1. be to extract through the process of extracting watermark module extraction watermark according to the position of embed watermark.

Perhaps E2. judges whether to have embedded watermark according to the characteristics of code stream; From the initial judgement of a frame, if C6 is 0, then extract watermark from lowest order, C6 did not extract watermark at 1 o'clock; If watermark less than 66 bits that extract when arriving postamble, then return the starting point of a frame, be the fetched at 1 place at C6, up to extracting 66 bit watermark.

Step F. use white noise to recover the high frequency voice through recovering the high frequency voice module:

At first, use the high-frequency parameter that extracts that it is carried out temporal envelope shaping and frequency domain envelope shaping then, can obtain the high frequency voice signal the AR model of the white noise sequence that produces through constructing by the low frequency voice.

Step F 1. uses white noise to recover the high frequency voice:

Because high frequency voice and low frequency voice have certain correlativity, the low frequency voice structure AR model that uses decoding to obtain; Produce white noise sequence in decoding end, this sequence is carried out forming processes through the AR model of constructing, make noise possess the characteristic of high frequency voice.

The local adjustment of step F 2. temporal envelopes, this partial reference the way of document " based on the DTX/CNG algorithm research and the realization of layering broadband voice coding/decoding system ":

The temporal envelope parameter of normalization temporal envelope parameter of from watermark, recovering and average temporal envelope calculating high-frequency signal:

Figure 2011104223927100002DEST_PATH_IMAGE015

。

The temporal envelope calculation of parameter time domain local gain factor by noise and high-frequency signal:

Figure 2011104223927100002DEST_PATH_IMAGE017

。

Use the temporal envelope of time domain local gain factor pair noise to adjust:

?

Figure 2011104223927100002DEST_PATH_IMAGE019

。

Gain factor between two sections uses approach based on linear interpolation to handle:

。

The local adjustment of step F 3. frequency domain envelopes, this partial reference the way of document " based on the DTX/CNG algorithm research and the realization of layering broadband voice coding/decoding system ":

The adjusted signal of time domain is handled according to extracting 12 frequency domain envelope parameters and average frequency domain envelope parameters, obtained the logarithm weighting sub belt energy parameter

Figure 2011104223927100002DEST_PATH_IMAGE023

and the average frequency domain envelope of noise.According in the local adjustment of temporal envelope to the local method of adjustment of the temporal envelope of noise, the frequency domain envelope of noise is carried out part adjustment.

The adjustment of the step F 4. frequency domain envelopes overall situation:

Calculate the frequency domain global gain factor of each frame by the average frequency domain envelope of noise and high-frequency signal:

。

Use frequency domain global gain factor is carried out overall situation adjustment to the frequency domain envelope of each frame:

。

Adjusted frequency spectrum is done the IFFT conversion, and using the window window function then is among 208 the buffer to obtaining depositing in after the time-domain signal windowing length:

。

Wherein, L=256, n=0,1 ... 207.

With the value of last 48 points among the former frame buffer and preceding 48 somes addition among the present frame buffer, then with present frame buffer in the value of n=48～159 constitute the time-domain signal that present frame recovers.

The adjustment of the step F 5. temporal envelopes overall situation:

Step according to the adjustment of the frequency domain envelope overall situation is carried out overall situation adjustment to temporal envelope, and adjusted signal

promptly is the high-frequency signal by Noise Estimation.

Step G. adopts the low frequency signal

of frequency through QMF composite filter pack module with 8KHz and the high-frequency signal that estimates improves SF to 16kHz; Then respectively through low pass and high pass FIR wave filter; The signal of handling is

and , and the coefficient of wave filter is identical with the QMF analysis filter.

Two signal plus are promptly obtained the broadband signal of final 16KHz SF:

。

The present invention provides a kind of device of expanding based on the speech bandwidth of audio frequency watermark in addition.The device of said speech bandwidth expansion based on audio frequency watermark comprises: QMF analysis filter pack module, extract the high-frequency parameter module, G.711 coding/decoding module, watermark merge module, extract the watermark module, recover high frequency voice module and QMF composite filter pack module.

Said QMF analysis filter pack module is divided into broadband voice the high fdrequency component of narrowband speech and the 8000～16000Hz of two part: 0～8000Hz; And export the signal sampling frequencies with two and reduce to 8KHz, obtain low frequency signal s _L( n) and high-frequency signal s _H( n).

Said extraction high-frequency parameter module is extracted 30 high-frequency parameters: 16 temporal envelope parameters, 12 frequency domain envelope parameters, average time domain envelope parameters and average frequency domain envelope parameters; This partial reference the way of document " based on the DTX/CNG algorithm research and the realization of layering broadband voice coding/decoding system ", below be the concrete method for distilling of each parameter:

Extract 16 temporal envelope parameters and average temporal envelope parameter:

。

Calculate average temporal envelope:

。

Use the time domain envelope parameters T( i) and mean value

Make difference and carry out normalization:

。

Extract 12 frequency domain envelope parameters and average frequency domain envelope parameters:

High fdrequency component s _H( n) 160 sampled points and last 48 of previous frame of present frame adopt point to get through a windowing process , use long 208 the sampling point window functions of window here Window( n):

。

Wherein, N=208.

。

Calculate the average frequency domain envelope:

。

With the frequency domain envelope parameters F( i) and mean value

Make difference and carry out normalization:

。

Said G.711 coding/decoding module is with narrow band voice signal s _L( n) through A-law encoding device coding, obtain each point 8 BitThe code stream of data length is embedded into watermark information in the code stream, is sent in the network through telephone wire; Receiving end extracts watermark information from code stream, and through A rule decoder decode, obtains narrow band voice signal.

Said watermark merge module is embedded into watermark and comprises following dual mode in the code stream:

Mode one: be embedded into watermark in the code stream uniformly through the watermark merge module: because a frame signal has 160 sampled points, and the bit number of embed watermark is 66bit, whenever embeds 1 bit information at a distance from a sampled point.

Mode two: watermark information selectively is embedded in the little sample point of amplitude through the watermark merge module; Use C0～C7 to represent the lowest order of encoding code stream to arrive most significant digit; According to agreement G.711, most significant digit C7 represents the sign bit of sampled point, and C6～C4 is the paragraph sign indicating number, C3～C0 section of being ISN; The paragraph sign indicating number is more little, and the amplitude of the sampled value of code stream representative is more little; This method uses the C6 position that division of signal is large-signal, i.e. C6=1 and small-signal, i.e. C6=0, embed watermark when C6 is 0; If not enough 66 of the position that a frame embeds then is chosen in other position embed watermarks.

It is corresponding with the watermark merge module that said extraction watermark module is extracted watermark, comprises dual mode:

Mode one: the process through extracting watermark module extraction watermark is to extract according to the position of embed watermark.

Mode two: the characteristics according to code stream judge whether to have embedded watermark; From the initial judgement of a frame, if C6 is 0, then extract watermark from lowest order, C6 did not extract watermark at 1 o'clock; If watermark less than 66 bits that extract when arriving postamble, then return the starting point of a frame, be the fetched at 1 place at C6, up to extracting 66 bit watermark.

Said recovery high frequency voice module uses white noise to recover the high frequency voice:

Use white noise to recover the high frequency voice:

Because high frequency voice and low frequency voice have certain correlativity, the low frequency voice structure AR model that uses decoding to obtain; Produce white noise sequence in decoding end, this sequence is carried out forming processes through the AR model module of constructing, make noise possess the characteristic of high frequency voice.

The local adjustment of temporal envelope, this partial reference the way of document " based on the DTX/CNG algorithm research and the realization of layering broadband voice coding/decoding system ":

。

。

?

。

。

The local adjustment of frequency domain envelope, this partial reference the way of document " based on the DTX/CNG algorithm research and the realization of layering broadband voice coding/decoding system ":

and the average frequency domain envelope

of noise.According in the local adjustment of temporal envelope to the local method of adjustment of the temporal envelope of noise, the frequency domain envelope of noise is carried out part adjustment.

The adjustment of the frequency domain envelope overall situation:

。

。

Adjusted frequency spectrum is done the IFFT conversion, and using the window window function then is in 208 the buffer device to obtaining depositing in after the time-domain signal windowing length:

。

Wherein, L=256, n=0,1 ... 207.

With the value of last 48 points in the former frame buffer device and preceding 48 somes addition in the present frame buffer device, then with present frame buffer device in the value of n=48～159 constitute the time-domain signal that present frame recovers.

The adjustment of the temporal envelope overall situation:

promptly is the high-frequency signal by Noise Estimation.

Said QMF composite filter pack module adopts the low frequency signal

of frequency with 8KHz and the high-frequency signal

that estimates improves SF to 16kHz; Then respectively through low pass and high pass FIR wave filter; The signal of handling is

and

, and the coefficient of wave filter is identical with the QMF analysis filter.

Two signal plus are promptly obtained the broadband signal of final 16KHz SF:

Beneficial effect: the present invention has provided a kind of method of improving speech quality based on audio frequency watermark.This method is utilized the characteristic of audio frequency watermark, in narrowband speech, sets up a hiding channel, utilizes the parameter of these Channel Transmission high frequency voice, thereby under the prerequisite that does not change the legacy network agreement, has realized the band spread of voice signal.The present invention uses the adaptive audio watermark to realize the speech bandwidth expansion, and high-frequency information less to the influence of raw tone, that embed is more, robustness good, is fit to various types of voice, and the broadband voice auditory effect that recovers is good than narrowband speech.

Description of drawings

Fig. 1 principle of the invention block diagram.

Fig. 2 window window function of the present invention.

Fig. 3 the present invention is the encoding code stream form G.711.

Fig. 4 the present invention recovers high frequency voice block diagram.

Embodiment

Below in conjunction with accompanying drawing and embodiment the present invention is elaborated.

Fig. 1 has provided the complete theory diagram of the present invention.The beginning part, the voice that the people sends are broadband signals, before through the telephone wire transmission, high-frequency parameter are embedded in the arrowband code stream, through telephone wire transmission narrow band voice signal; Carry out the decoding of A rule at receiving end, use the high-frequency parameter extraction module to extract high-frequency parameter then, use the high-frequency parameter synthesis module to recover the HFS in the broadband voice, at last with high frequency voice and low frequency phonetic synthesis broadband voice.

Each module introduction that relates in the principle of the invention block diagram is following:

1, QMF analysis filter pack module

To send voice are broadband voices to the beginning groups of people, and phone line defeated be narrowband speech, so the present invention uses the QMF analysis filterbank broadband voice to be divided into the high fdrequency component of narrowband speech and the 8000～16000Hz of two part: 0～8000Hz.QMF analysis filter among the present invention uses the FIR wave filter on 64 rank, low-pass FIR filter h _L(n) coefficient is seen appendix.Hi-pass filter h _H(n) be by low-pass filter h _L(n) frequency displacement obtains, and just uses multiple sinusoidal sequence

Modulation, that is: = =

Broadband signal is passed through the QMF analysis filterbank, and two output signal sampling frequencies are reduced to 8KHz, just can obtain low frequency signal s _L(n) and high-frequency signal s _H(n).

2, extract the high-frequency parameter module

30 high-frequency parameters of extraction required for the present invention: 16 temporal envelope parameters, 12 frequency domain envelope parameters, average time domain envelope parameters and average frequency domain envelope parameters.It below is the concrete method for distilling of each parameter.

(1) extracts 16 temporal envelope parameters and average temporal envelope parameter

The high fdrequency component s of every 20ms _H(n) be divided into 16 sections, every section comprises 10 sampled points.16 temporal envelope parameters are:

。

Calculate average temporal envelope:

。

Make difference with time domain envelope parameters T (i) with mean value

and carry out normalization:

。

(2) extract 12 frequency domain envelope parameters and average frequency domain envelope parameters

High fdrequency component s _HLast 48 of 160 sampled points of present frame (n) and previous frame are adopted point to get through a windowing process

, use long 208 the sampling point window function window (n) of window here:

。

Wherein, N=208.Window function is as shown in Figure 2.

Signal after the windowing is mended 0 to 256 point, and the FFT conversion of doing then at 256 gets S _F(k):

。

Wherein, L=256.Frequency domain is divided into 12 evenly at interval, calculates each frequency domain envelope parameters at interval, and convert logarithm weighting sub belt energy parameter to.The computing method of frequency domain envelope sub-band division and the logarithm weighted energy F (i) that respectively carries are seen appendix.

Calculate the average frequency domain envelope:

。

Frequency domain envelope parameters F (i) is made difference with mean value carries out normalization:

。

3, coding/decoding module G.711

With narrow band voice signal s _L(n) through A-law encoding device coding, obtain the code stream of each some 8bit data length, watermark information is embedded in the code stream, be sent in the network through telephone wire.Receiving end extracts watermark information from code stream, and through A rule decoder decode, obtains narrow band voice signal.

4, watermark merge module

It is simply watermark information to be embedded in the lowest order of arrowband code stream that existing least significant bit (LSB) embeds algorithm, and to the characteristics of host-host protocol and the subjective sensation of people's ear, this paper proposes two kinds of modified least significant bit (LSB)s and embeds algorithm.

First method is to be embedded into watermark in the code stream comparatively uniformly: because a frame signal has 160 sampled points, and the bit number of embed watermark is 66bit, can whenever embed 1 bit information at a distance from a sampled point.Bad in the time of can avoiding causing the auditory effect fashion like this because of localized distortion is excessive, make whole auditory effect remain on a higher level.

Second method is that the auditory properties according to the characteristics of host-host protocol and people's ear proposes a kind of selectable least significant bit (LSB) and embeds algorithm.What G.711 use is non-uniform quantizing, signal sampling value hour, and quantized interval is also little; When the signal sampling value was big, quantized interval was also big.So if change the encoding code stream of little sample value, the amplitude of variation of sample value is little, change the encoding code stream of big sample value, the vary within wide limits of sample value.Make that so no matter watermark being embedded into little sample point still is big sample point, the signal to noise ratio (S/N ratio) that obtains theoretically changes very little.But according to the time domain masking effect of people's ear, large-signal makes the modification of small-signal be difficult for being discovered by people's ear to the masking effect of back small-signal.According to this characteristic, can watermark information selectively be embedded in the little sample point of amplitude, make that the hiding property of watermark is better.Use C0～C7 to represent the lowest order of encoding code stream to arrive most significant digit, as shown in Figure 3.Follow G.711 agreement of certificate, most significant digit C7 represents the sign bit of sampled point, and C6～C4 is the paragraph sign indicating number, C3～C0 section of being ISN.The paragraph sign indicating number is more little, and the amplitude of the sampled value of code stream representative is more little.This paper uses the C6 position that division of signal is large-signal (C6=1) and small-signal (C6=0), embed watermark when C6 is 0.If not enough 66 of the position that a frame embeds then is chosen in other position embed watermarks.

5, extract the watermark module

According to the difference that embeds algorithm, use the watermark extracting method corresponding with it.The process of first kind of algorithm extraction watermark is to extract according to the position of embed watermark.Second method is that the characteristics according to code stream judge whether to have embedded watermark.From the initial judgement of a frame, if C6 is 0, then extract watermark from lowest order, C6 did not extract watermark at 1 o'clock.If watermark less than 66 bits that extract when arriving postamble, then return the starting point of a frame, be the fetched at 1 place at C6, up to extracting 66 bit watermark.

6, recover the high frequency voice module

Because high frequency characteristics of speech sounds and noise ratio are similar, this module uses white noise to recover the high frequency voice.At first, use the high-frequency parameter that extracts that it is carried out temporal envelope shaping and frequency domain envelope shaping then, can obtain the high frequency voice signal the AR model of the white noise sequence that produces through constructing by the low frequency voice.It is as shown in Figure 4 to recover high frequency voice block diagram.

(1) use white noise to recover the high frequency voice

Because high frequency voice and low frequency voice have certain correlativity, the low frequency voice structure AR model that uses decoding to obtain.Produce white noise sequence in decoding end, this sequence is carried out forming processes through the AR model of constructing, make noise possess the characteristic of high frequency voice.

(2) the local adjustment of temporal envelope

。

。

?

。

。

(3) the local adjustment of frequency domain envelope

and the average frequency domain envelope

(4) frequency domain envelope overall situation adjustment

。

。

Adjusted frequency spectrum is done the IFFT conversion, and the window window function that uses Fig. 2 then is among 208 the buffer to obtaining depositing in after the time-domain signal windowing length:

。

Wherein, L=256, n=0,1 ... 207.

(5) temporal envelope overall situation adjustment

promptly is the high-frequency signal by Noise Estimation.

7, QMF composite filter pack module

8KHz is adopted the low frequency signal

of frequency and the high-frequency signal that estimates improves SF to 16kHz; Then respectively through low pass and high pass FIR wave filter; The signal of handling is

and

, and the coefficient of wave filter is identical with the QMF analysis filter.

Two signal plus are promptly obtained the broadband signal of final 16KHz SF:

。

Sum up: present embodiment proposes two kinds of modified least significant bit (LSB) watermarking algorithms.A kind of improving one's methods is whenever to embed 1 bit information at a distance from a sampled point, and be bad in the time of can avoiding causing the auditory effect fashion because of localized distortion is excessive like this, makes whole auditory effect remain on a higher level.It is to propose a kind of selectable least significant bit (LSB) embedding algorithm according to the characteristics of host-host protocol and the auditory properties of people's ear that another kind is improved one's methods.According to the time domain masking effect of people's ear, large-signal makes the modification of small-signal be difficult for being discovered by people's ear to the masking effect of back small-signal.According to this characteristic, can watermark information selectively be embedded in the little sample point of amplitude, make that the hiding property of watermark is better.

Native system is embedded into the high-frequency information in the voice signal in the arrowband code stream based on above-mentioned watermarking algorithm, transfers out through cable telephone network, extracts the high-frequency parameter of voice at receiving end, the synthetic wideband voice, thus realize the spread spectrum of voice signal.Because the sheltering of watermarking algorithm is better, so even do not extract watermark and synthetic wideband voice functions module at receiving end, also can not influence normal speech quality.And the telephone terminal with this function will be heard the language behind the spread-spectrum, and speech quality is greatly improved.

Above content is to combine optimal technical scheme to the further explain that the present invention did, and can not assert that the practical implementation of invention only limits to these explanations.Under the present invention, the those of ordinary skill of technical field, under the prerequisite that does not break away from design of the present invention, simple deduction and replacement can also be made, all protection scope of the present invention should be regarded as.

Appendix

The frequency domain envelope carry division:

The logarithm weighted energy that respectively carries F( i) computing method:

0 subband:

，

，

，

，。

1 ~ 10 subband:

Figure 2011104223927100002DEST_PATH_IMAGE067

，

Figure 2011104223927100002DEST_PATH_IMAGE069

，

，

，，?

，

，

，

，

，

。

11 subbands:

，

，

，

，

。

Claims

1. the method based on the expansion of the speech bandwidth of audio frequency watermark may further comprise the steps, and wherein step B, step F 2, F3 are with reference to the way in the document " based on the DTX/CNG algorithm research and the realization of layering broadband voice coding/decoding system ":

Steps A. use QMF analysis filter pack module broadband voice to be divided into the high fdrequency component of narrowband speech and the 8000～16000Hz of two part: 0～8000Hz; And export signals with two and fall sampling module through one, SF is reduced to 8KHz, obtain low frequency signal s _L( n) and high-frequency signal s _H( n);

Module is extracted 30 high-frequency parameters: 16 temporal envelope parameters, 12 frequency domain envelope parameters, average time domain envelope parameters and average frequency domain envelope parameters; Below be the concrete method for distilling of each parameter:

Calculate average temporal envelope:

Use the time domain envelope parameters T( i) and mean value

Make difference and carry out normalization:

High fdrequency component s _H( n) 160 sampled points and last 48 of previous frame of present frame adopt point to get through windowing resume module

, use long 208 the sampling point window functions of window here Window( n):

Wherein, N=208;

Mend 0 to 256 point through the signal after the windowing module, the FFT conversion of doing then at 256 gets S _F( k):

Wherein, L=256; Frequency domain is divided into 12 evenly at interval, calculates each frequency domain envelope parameters at interval, and convert logarithm weighting sub belt energy parameter to;

Calculate the average frequency domain envelope:

With the frequency domain envelope parameters F( i) and mean value

Make difference and carry out normalization:

D1. be embedded into watermark in the code stream uniformly through the watermark merge module: because a frame signal has 160 sampled points, and the bit number of embed watermark is 66bit, whenever embeds 1 bit information at a distance from a sampled point;

Perhaps D2. selectively is embedded into watermark information in the little sample point of amplitude through the watermark merge module; Use C0～C7 to represent the lowest order of encoding code stream to arrive most significant digit; According to agreement G.711, most significant digit C7 represents the sign bit of sampled point, and C6～C4 is the paragraph sign indicating number, C3～C0 section of being ISN; The paragraph sign indicating number is more little, and the amplitude of the sampled value of code stream representative is more little; This method uses the C6 position that division of signal is large-signal, i.e. C6=1 and small-signal, i.e. C6=0, embed watermark when C6 is 0; If not enough 66 of the position that a frame embeds then is chosen in other position embed watermarks;

E1. through extracting the watermark module

Perhaps E2. judges whether to have embedded watermark according to the characteristics of code stream; From the initial judgement of a frame, if C6 is 0, then extract watermark from lowest order, C6 did not extract watermark at 1 o'clock; If watermark less than 66 bits that extract when arriving postamble, then return the starting point of a frame, be the fetched at 1 place at C6, up to extracting 66 bit watermark;

At first, use the high-frequency parameter module of extracting that it is carried out temporal envelope shaping and frequency domain envelope shaping then, can obtain the high frequency voice signal the AR model equipment of the white noise sequence that produces through constructing by the low frequency voice;

Step F 1. uses white noise to recover the high frequency voice:

Because high frequency voice and low frequency voice have certain correlativity, the low frequency voice structure AR model module that uses decoding to obtain; Produce white noise sequence in decoding end, this sequence is carried out forming processes through the AR model module of constructing, make noise possess the characteristic of high frequency voice;

The local adjusting module of step F 2. temporal envelopes:

Use time domain local gain module that the temporal envelope of noise is adjusted:

?

；

The local adjusting module of step F 3. frequency domain envelopes:

and the average frequency domain envelope

of noise; According in the local adjustment of temporal envelope to the local method of adjustment of the temporal envelope of noise, the frequency domain envelope of noise is carried out part adjustment

Step F 4. frequency domain envelopes overall situation adjusting module:

Adjusted frequency spectrum is fed the IFFT conversion module, and using window window function module then is in 208 the buffer device to obtaining depositing in after the time-domain signal windowing length:

Wherein, L=256;

With the value of last 48 points in the former frame buffer device and preceding 48 somes addition in the present frame buffer device, then with present frame buffer device in the value of n=48～159 constitute the time-domain signal that present frame recovers;

Step F 5. temporal envelopes overall situation adjusting module:

promptly is the high-frequency signal by Noise Estimation

Step G. improves SF to 16kHz through QMF composite filter pack module with the low frequency signal of 8KHz SF and the high-frequency signal

that estimates; Then respectively through low pass and high pass FIR filter module; The signal of handling is

and

, and the coefficient of wave filter is identical with the QMF analysis filter;

Two signal plus are promptly obtained the broadband signal of final 16KHz SF:

。

2. device based on the expansion of the speech bandwidth of audio frequency watermark; It is characterized in that the device of said speech bandwidth expansion based on audio frequency watermark comprises: QMF analysis filter pack module, extract the high-frequency parameter module, G.711 coding/decoding module, watermark merge module, extract the watermark module, recover high frequency voice module and QMF composite filter pack module;

Said QMF analysis filter pack module is divided into broadband voice the high fdrequency component of narrowband speech and the 8000～16000Hz of two part: 0～8000Hz; And export the signals feeding with two and fall sampling module, SF is reduced to 8KHz, obtain low frequency signal s _L( n) and high-frequency signal s _H( n);

Extract 16 temporal envelope parameters and average temporal envelope parameter module:

Calculate average temporal envelope:

Use the time domain envelope parameters T( i) and mean value

Make difference and carry out normalization:

Extract 12 frequency domain envelope parameters and average frequency domain envelope parameters module:

, use long 208 the sampling point window functions of window here Window( n):

Wherein, N=208;

Calculate the average frequency domain envelope:

With the frequency domain envelope parameters F( i) and mean value

Make difference and carry out normalization:

Said G.711 coding/decoding module is with narrow band voice signal s _L( n) through A-law encoding device module coding, obtain each point 8 BitThe code stream of data length is embedded into watermark information in the code stream, is sent in the network through telephone wire; Receiving end extracts watermark information from code stream, and through A rule decoder decode, obtains narrow band voice signal;

Mode one: be embedded into watermark in the code stream uniformly through the watermark merge module: because a frame signal has 160 sampled points, and the bit number of embed watermark is 66bit, whenever embeds 1 bit information at a distance from a sampled point;

Mode two: watermark information selectively is embedded in the little sample point of amplitude through the watermark merge module; Use C0～C7 to represent the lowest order of encoding code stream to arrive most significant digit; According to agreement G.711, most significant digit C7 represents the sign bit of sampled point, and C6～C4 is the paragraph sign indicating number, C3～C0 section of being ISN; The paragraph sign indicating number is more little, and the amplitude of the sampled value of code stream representative is more little; This method uses the C6 position that division of signal is large-signal, i.e. C6=1 and small-signal, i.e. C6=0, embed watermark when C6 is 0; If not enough 66 of the position that a frame embeds then is chosen in other position embed watermarks;

Mode one: the process through extracting watermark module extraction watermark is to extract according to the position of embed watermark;

Mode two: the characteristics according to code stream judge whether to have embedded watermark; From the initial judgement of a frame, if C6 is 0, then extract watermark from lowest order, C6 did not extract watermark at 1 o'clock; If watermark less than 66 bits that extract when arriving postamble, then return the starting point of a frame, be the fetched at 1 place at C6, up to extracting 66 bit watermark;

At first, use the high-frequency parameter that extracts that it is carried out temporal envelope shaping and frequency domain envelope shaping then, can obtain the high frequency voice signal the AR model of the white noise sequence that produces through constructing by the low frequency voice;

Use white noise to recover the high frequency voice module:

The local adjusting module of temporal envelope, this partial reference the way of document " based on the DTX/CNG algorithm research and the realization of layering broadband voice coding/decoding system ":

?

；

The local adjusting module of frequency domain envelope, this partial reference the way of document " based on the DTX/CNG algorithm research and the realization of layering broadband voice coding/decoding system ":

and the average frequency domain envelope

of noise; According in the local adjustment of temporal envelope to the local method of adjustment of the temporal envelope of noise, the frequency domain envelope of noise is carried out part adjustment;

The adjustment of the frequency domain envelope overall situation:

Through the IFFT conversion module, using window window function module then is in 208 the buffer device to obtaining depositing in after the time-domain signal windowing length with adjusted frequency spectrum:

Wherein, L=256, n=0,1 ... 207;

Temporal envelope overall situation adjusting module:

promptly is the high-frequency signal by Noise Estimation;

Said QMF composite filter pack module improves SF to 16kHz with the low frequency signal

of 8KHz SF and the high-frequency signal

and

, and the coefficient of wave filter is identical with the QMF analysis filter;

Two signal plus are promptly obtained the broadband signal of final 16KHz SF:

。