CN1437169A

CN1437169A - Method of, apparatus and system for performing data insertion/extraction for phonetic code

Info

Publication number: CN1437169A
Application number: CN03102322.3A
Authority: CN
Inventors: 大田恭士; 鈴木政直; 土永义照; 田中正清; 佐佐木繁
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2002-02-04
Filing date: 2003-01-30
Publication date: 2003-08-20
Anticipated expiration: 2023-01-30
Also published as: DE60330716D1; EP1333424B1; EP1693832A2; EP1333424A2; EP1333424A3; EP1693832B1; DE60330413D1; CN100514394C; JP2003295879A; JP4330346B2; EP1693832A3

Abstract

The invention provides method, apparatus and system for embedding data in and extracting data from encoded voice code. When a voice encoding apparatus embeds any data in encoded voice code, the apparatus determines whether data embedding condition is satisfied using a first element code from among element codes constituting the encoded voice code, and a threshold value. If the data embedding condition is satisfied, the apparatus embeds optional data in the encoded voice code by replacing a second element code with the optional data. When a voice decoding apparatus extracts data that has been embedded in encoded voice code, the apparatus determines whether data embedding condition is satisfied using a first element code from among element codes constituting the encoded voice code, and a threshold value. If the data embedding condition is satisfied, the apparatus determines that optional data has been embedded in the second element code portion of the encoded voice code and extracts this embedded data.

Description

Carry out data insertion/extraction method, device and system for phonetic code

Technical field

The present invention relates to the audio digital signals treatment technology as suitable application area such as packet voice (audio frequency) communication or digital speech storage, particularly relate to by when keeping abideing by data format standard, and do not damage voice quality ground and will replace with data arbitrarily, these data are embedded into the data embedded technology of phonetic code with the part of the phonetic code (digital code) that speech coding technology compressed.

Background technology

Relevant data embedded technology, with be applicable to digital mobile radio system or associate with the speech coding technology of the packet voice transfer system of VoIP representative, digital speech storage etc., as eletric watermark technology or function expansion technique by not bringing influence ground embedding literary property or id information that call confidentiality is improved to the transmission bit sequence, its needs and importance increase just day by day.

Popularizing of explosion type with the Internet is that background increases just day by day to divide into groups to come the needs of the Internet telephony of communicate voice data with IP.By the grouping communicate voice data, just produce the advantage that can transmit different medium such as order or view data integratedly.But, the multimedia communication to transmit independently up to now with different channels.In addition, wait the service of the cost of the phone call that reduces the user also carrying out, but only be at the start-up portion that has begun conversation by inserting advertisement.In addition, by the grouping communicate voice data, just can transmit different medium such as order or view data integratedly, but, on confidentiality, will go wrong owing to transformat is known to the public.With these is background, and " eletric watermark " technology that literary property information etc. is embedded in the compress speech data (code) just is able to motion.

On the other hand, when transmitting, adopt the speech coding technology of compressed voice expeditiously as the purpose that improves transmission efficiency.Particularly in the VoIP field, just becoming main flow as the speech coding technology that G.729 waits of ITU-T (phone standardization department of the international telecommunication union telecommunication) defined of the International Organization for Stand.In addition, in the field of mobile communication, also adopt G.729 or the speech coding technology of the AMR (Adaptive Multi Rate) of 3GPP (The Third Generation Partnership Project) defined etc.Common point in the middle of them is to be called GLEP (Code Excited Linear Prediction) algorithm.G.729 coded system and decoding process are as following.

The formation of scrambler and action

Figure 41 is the structural drawing of scrambler of the G.729 mode of ITU-T suggestion.In Figure 41, (=N) input signal (voice signal) X is that unit is input to the lpc analysis unit with the frame to be equivalent to the predetermined hits of per 1 frame.Be made as 10msec if sample rate is made as 8kHz, 1 image duration, 1 frame is exactly 80 samplings.Lpc analysis unit 1 regards people's sound channel with the represented holopolar form wave filter of following formula as,

H(z)＝1/[1+∑αi·z ^-i](i＝1～P)????(1)

And obtain this wave filter factor alpha i (i=1 ..., p).Here, P is the wave filter number of times.General under the situation of telephone band voice, as the value of p use 10～20.In LPC (linear prediction) analytic unit 1, use 80 samplings of input signal and read 40 samplings of part earlier and the total 240 of 120 samplings of signal is in the past sampled and carried out lpc analysis and obtain the LPC coefficient.

Parameter transformation unit 2 becomes LSP (linear spectral to) parameter with the LPC transformation of coefficient.Here, the LSP parameter is can carry out the parameter of the frequency field of conversion with the LPC coefficient mutually, because quantized character is also outstanding than LPC coefficient, so quantize to carry out in the zone of LSP.The LSP parameter of 3 pairs of institutes of LSP quantifying unit conversion quantizes and obtains LSP code and LSP re-quantization value.LSP interpolation unit 4 is obtained the LSP interpolate value from the LSP re-quantization value obtained by present frame with by the LSP re-quantization value that former frame is obtained.That is, 1 frame is divided into 2 subframes of the 1st, the 2nd of 5msec, and lpc analysis unit 1 determines the LPC coefficient of the 2nd subframe, but does not determine the LPC coefficient of the 1st subframe.So LSP interpolation unit 4 uses LSP re-quantization value of being obtained by present frame and the LSP re-quantization value of being predicted the 1st subframe by the LSP re-quantization value that former frame is obtained by interpolative operation.

Parameter inverse transformation block 5 is transformed into LSP re-quantization value and LSP interpolate value the LPC coefficient and is set to LPC composite filter 6 respectively.In the case,, in the 1st subframe of frame, use LPC coefficient, the LPC coefficient that in the 2nd subframe, has used from the re-quantization value transform from the conversion of interpolate value institute as the filter coefficient of LPC composite filter 6.In addition, the interpolation character is arranged among the l after this, for example at lspi, li (n) ... in l be exactly alphabetic(al) l.

When LSP parameter l spi (i=1 ..., p) in LSP quantifying unit 3, quantize by vector quantization after, quantization index (LSP code) just transmits to decoder side.

Then, the search of carrying out source of sound and gain is handled.Source of sound and gain are that unit handles with the subframe.At first, sound source signal 2 is divided into pitch period composition and noise contribution, the adaptive codebook 7 of the sound source signal sequence of passing by has been preserved in the quantification use of pitch period composition, algebraically code book or noise code basis 8 are used in the quantification of noise contribution.Below, just use adaptive codebook 7 and this voice coding modes of 8 of noise code to describe as the source of sound code book.

Adaptive codebook 7, according to index 1～L output delay successively the sound source signal (being called cyclical signal) of N sampling section of 1 sampling.N is the hits (N=40) of 1 subframe, and has the buffer memory of the pitch period composition of preserving up-to-date (L+39) individual sampling.Determine the cyclical signal of the 1st～40th sampling from index 1, determine the cyclical signal of the 2nd～41st sampling from index 2 ... determine the cyclical signal of L～L+39 sampling from index L.In original state, the whole input amplitudes in the inside of adaptive codebook 7 are 0 signal, move so that to each subframe will be on the time the oldest signal abandon the long part of subframe, and will be saved in adaptive codebook 7 by the sound source signal that current subframe is obtained.

Adaptive codebook search uses the adaptive codebook 7 of preserving sound source signal in the past to discern the cyclic component of sound source signal.That is, each sampling is changed the original point of reading from adaptive codebook 7 on one side the sound source signal in the past in the adaptive codebook 7 is taken out subframe length (=40 sample) part, and be input to LPC composite filter 6 generation tone composite signal β AP _LBut, P _LBe the pitch period signal (adaptive code vector) that is equivalent to the past of the delay L that taken out from adaptive codebook 7, A is the pulse reply of LPC composite filter 6, and B is an adaptive codebook gain.

Arithmetic element 9 is obtained input voice X and β AP by following formula _LError power E _L

E _L＝|X-βAP _L| ²????(2)

When the synthetic output of the weighting of adaptive codebook output is made as AP _L, AP _LAutocorrelation value be made as R _Pp, AP _LBe made as R with the cross correlation value of input signal X _XpAfter, the error power of (2) formula becomes minimum tone and delays L _OptIn adaptive code vector P _LJust represent by following formula.

P _L＝argmax(R _xp ²/R _pp)????(3)

That is, will be with the autocorrelation value R of tone composite signal _PpTo tone composite signal AP _LCross correlation value R with input signal X _XpAfter to have carried out standardized value be to become the maximum starting point of reading to be made as optimum starting point.By top processing, error power evaluation unit 10 is obtained the tone of satisfied (3) formula and is delayed L _OptAt this moment, optimum pitch gain beta _OptProvide by following formula.

β _opt＝R _xp/R _pp????(4)

Then, use these 8 pairs of noise contributions that are included in the sound source signal of noise code to quantize.This 8 is that a plurality of pulses of 1 or-1 constitute by amplitude for noise code.As an example, be the pulse position of the situation of 40 samplings in the length of subframe shown in the table 1.

[table 1]

G.729 noise code originally

The pulse level	Pulse position	Polarity
The pulse level	Pulse position	Polarity	i ₀：1	?m ₀： ?0，5，10，15，20，25，30，35	S ₀+/-
i ₁：2	?m ₁： ?1，6，11，16，21，26，31，36	S ₁+/-	i ₀：1	?m ₀： ?0，5，10，15，20，25，30，35	S ₀+/-
i ₁：2	?m ₁： ?1，6，11，16，21，26，31，36	S ₁+/-	i ₂：3	?m ₂： ?2，7，12，17，22，27，32，37	S ₂+/-
i ₃：4	?m ₃： ?3，8，13，18，23，28，33，38 ?4，9，14，19，24，29，34，39	S ₃+/-	i ₂：3	?m ₂： ?2，7，12，17，22，27，32，37	S ₂+/-

Noise code basis 8 is divided into a plurality of pulse level groups 1～4 with the individual sampled point of N (=40) that constitutes 1 subframe, for taking out 1 sampled point m from each pulse level group ₀～m ₃The all combinations that form, will in each sampled point, have+pulse signal of 1 or-1 pulse exports successively as noise contribution.In this example, be 4 pulses of per 1 sub-frame configuration basically.

Figure 42 is a key diagram of distributing to the sampled point of each pulse level group 1～4.

(1) in

pulse level group

1,8 sampled

points

0,5,10,15,20,25,30,35 have been distributed;

(2) in pulse level group 2,8 sampled

points

1,6,11,16,21,26,31,36 have been distributed;

(3) in pulse level group 3,8 sampled

points

2,7,12,17,22,27,32,37 have been distributed;

(4) in pulse level group 4,16 sampled

points

3,4,8,9,13,14,18,19,23,24,28,29,33,34,38,39 have been distributed.

For the sampled point that shows pulse level group 1～3 needs 3,1 of the positive and negative needs of performance pulse are to need 4 altogether, and in addition, for the sampled point that shows pulse level group 4 needs 4,1 of the positive and negative needs of performance pulse are to need 5 altogether.Therefore, in order to determine just to need 17 from the sound source signal of the pulse features of these 8 outputs of noise code of pulse configuration with table 1, the kind of pulse feature sound source signal will exist 2 ¹⁷(=2 ⁴* 2 ⁴* 2 ⁴* 2 ⁵).

Limit the pulse position of good each pulse level as shown in Figure 1, in this search of noise code, among the combination of the pulse position of each pulsed system, decision is the combination of the pulse of minimum with the error power of input voice in regeneration zone.That is, get the optimum pitch gain beta of obtaining by adaptive codebook search _Opt, with this gain beta _OptMultiply by adaptive codebook output P _LAfter be input to totalizer 11.Meanwhile originally 8 successively the pulse feature sound source signal is input to totalizer 11, and determines that the difference that totalizer is input to LPC composite filter 6 resulting regenerated signals and input signal X is minimum pulse feature sound source signal from noise code.Specifically be exactly, at first the optimum adaptive codebook output P that obtains from input signal X, by adaptive codebook search _L, the optimum pitch gain beta _Opt, generate the target vector X ' that is used for this search of noise code by following formula.

X′＝X-β _optAP _L????(5)

In this example, because as previously explained above with 17 position and amplitudes (positive and negative) that show pulse, so there is 2 17 sides group in this combination.Here.When the noise code output vector of establishing k time is C _kAfter, just in this search of noise code, obtain the evaluation function error power D that makes following formula and be minimum code vector C _k

D＝|X′-G _cAC _k| ²????(6)

G _cIt is this gain of noise code.Error power evaluation unit 10 is searched for the autocorrelation value R with the noise composite signal in noise code search originally _CcTo noise composite signal AC _kWith the cross correlation value R of input signal X ' _CxCarry out the resulting standardization cross correlation value of standardization (R _Cx* R _Cx/ R _Cc) be the pulse position of maximum and the combination of polarity.

Gain quantization describes with that.This gain of noise code does not directly gain in G.729, to adaptive codebook gain G _a(=β _Opt) and this gain G of noise code _cCorrection factor γ carry out vector quantization.Here, this gain G of noise code _cAnd between the correction factor γ G is arranged _cThe relation of=g ' * γ.G ' is the gain from the log gain institute predicted current frame of 4 subframes in past.

Not having in the illustrated gain quantization table of gain quantization device 12, prepared 128 group (=2 for the combination of the correction factor γ of adaptive codebook gain and this gain of noise code ⁷).This searching method of gain code is, 1. for adaptive codebook output vector and this output vector of noise code, among gain quantization table, take out 1 group tabular value and be set to gain-variable unit 13,14,2. in gain-variable unit 13,14 with gain G _a, G _cMultiply by vector separately and be input to LPC composite filter 6,3. the error power of selection and input signal X is minimum combination in error power evaluation unit 10, carries out thus.

By top processing, code L 1. as the LSP code of the quantization index of LSP, is 2. delayed as the tone of the quantization index of adaptive codebook in 15 pairs of circuit demultiplexing unit _Opt, 3., 4. carry out demultiplexing and generate track data as the gain code of gain quantization index as the noise code of this index of noise code.In fact before sending, be necessary to carry out line coding or packing processing to transmission line.

The formation of demoder and action

Figure 43 is the block diagram of the demoder of mode G.729.Delay code, noise code, gain code from the track data that circuit receives to circuit separative element 21 input back separation output LSP codes, tone.In demoder, speech data is decoded based on these codes.About the action of demoder, a part repeats mutually in the scrambler because the functional packet of demoder is contained in, so describe simply below.

LSP inverse quantization unit 22 is carried out re-quantization after the input of LSP code, output LSP re-quantization value.LSP interpolation unit 23, the LSP re-quantization value in LSP re-quantization value from the 2nd subframe of present frame and the 2nd subframe of former frame is carried out interpolative operation to the LSP of the 1st subframe of present frame.Then, parameter inverse transformation block 24 becomes separately LPC composite filter coefficient with the LSP interpolate value with LSP re-quantization value transform.G.729 the LPC composite filter 25 of mode uses the LPC coefficient from the conversion of LSP interpolate value in the 1st initial subframe, uses the LPC coefficient from LSP re-quantization value transform in the 2nd subframe that is right after.

Adaptive codebook 26 is delayed the tone signal of reading starting position output subframe long (=40 samplings) of code indication from tone, noise code this 27 from the read-out position output pulse position of corresponding noise code and the polarity of pulse.In addition, gain inverse quantization unit 28 calculates adaptive codebook gain re-quantization value and this gain of noise code re-quantization value by the gain code of being imported, and is set to gain-variable unit 29,30.Totalizer 31 is exported resulting signal with adaptive codebook with the adaptive codebook gain re-quantization is on duty, originally export resulting signal plus with noise code and rise and generate sound source signal with this gain of noise code re-quantization is on duty, and this sound source signal is input to LPC composite filter 25.Thus, just can obtain reproduce voice from LPC composite filter 25.

In addition, in original state, the whole input amplitudes in inside of the adaptive codebook 26 of decoder side are 0 signal, move so that will abandon the long part of subframe by the oldest signal on the time to each subframe, on the other hand, will be saved in adaptive codebook 26 by the sound source signal that current subframe is obtained.Just, the adaptive codebook 26 of encoder is maintained up-to-date equal state all the time.

The eletric watermark technology

As being the eletric watermark technology of object, disclosed in the flat 11-272299 of Jap.P. Publication Laid-Open " embedding grammar of the watermark bit during voice coding " arranged with above-mentioned CELP.Figure 44 is the eletric watermark technical descriptioon figure that is correlated with.In table 1, be conceived to the 4th pulse level i ₃The 4th pulse level i ₃Pulse position m ₃, with other the 1st～the 3rd pulse levels i ₀～i ₂Pulse position m ₀～m ₂Difference is held adjacent candidate.If according to standard G.729, the 4th pulse level i ₃In pulse position, even select adjacent pulse position also not have what obstruction.For example, the 4th pulse level i ₃In pulse position m ₃=4, can replace to pulse position m ₃'=3, can not bring influence even regenerate after replacing to people's the sense of hearing yet.Therefore, for to m ₃Candidate carry out the additional key K that imports 8 of label _pFor example, such as shown in figure 45, establish K _p=00001111, make K _pEach position corresponding m respectively ₃Each candidate 3,8,13,18,23,28,33,38, and establish ^*K _p=11110000, make ^*K _pEach position corresponding m respectively ₃Each candidate 4,9,14,19,24,29,34,39.If carry out correspondence like this, according to key K _pJust can be to m ₃Whole candidates to carry out the label of " 0 " and " 1 " additional.In relevant situation, watermark bit " 0 " is being embedded under the situation of phonetic code, according to key K _pAmong the candidate of the label that added " 0 ", select m ₃On the other hand, under the situation of embed watermark position " 1 ", according to key K _pAmong the candidate of the label that added " 1 ", select m ₃Utilize the method just the watermark information of 2 values can be embedded among the phonetic code.So, by in the equipment of receiving and sending messages, holding above-mentioned key K mutually _pJust can carry out the embedding and the extraction of watermark information.Because the subframe to each 5msec can embed 1 watermark information so per second just can embed 200.

, when using identical key K _pBehind whole code embed watermark informations, the possibility of being decoded by the 3rd illegal side just uprises.Therefore, just be necessary to seek the raising of confidentiality.If establish m ₀～m ₃Aggregate value be C _p, aggregate value is exactly some among 58 shown in Figure 45 (a).So, import 58 the 2nd key K _Con, shown in Figure 45 (b), make 58 aggregate value C like that _pTo each position that should key.M in the noise code when then, calculating voice coding ₀～m ₃Aggregate value (being 72 among the figure), and check the key K meet this aggregate value _ConPlace value C _PbBe " 0 " or " 1 ", work as C _PbDuring=" 1 ", watermark bit is embedded into phonetic code, if " 0 " incites somebody to action not embed watermark position according to Figure 44.So, do not know key K _ConThe 3rd side to decode watermark information will the difficulty.

By with voice channel independently channel transmit under other the situation of medium, it is minimum that also to need the terminal device at two ends are multichannel correspondences.In the case, for example current popularizing the 2nd generation portable telephone etc., in being connected in the terminal of existing communication network, just the problem that is limited is arranged.In addition, even the terminal device at two ends is multichannel correspondences, can transmit a plurality of medium under the situation of packet switch, path disperses, the synchronous/concerted action difficulty in the trunking in the way.Particularly in the concerted action of the data of having used the voice itself that are additional to specific user institute sounding, can have the problem that necessity is carried out complicated control such as path setting or synchronous processing.

In addition, in existing eletric watermark technology, the use of key is necessary.Just be necessary total specific key at receiving-transmitting sides for this reason.Add, the data embedded object is defined to the pulse position of the 4th pulse level of noise code book.Thus, the user can know that the possibility of existence of key is just high, owing under situation about can know, just can determine embedded location, with regard to existence the leakage of data, the problem of distorting may take place.

In addition, in existing eletric watermark technology,, utilize the influence of the tonequality deterioration that data embed to become the big high problem of possibility with regard to existence because the enforcement of the embedding of data, not implement be " at random " control of being undertaken by the aggregate value of pulse position candidate.Not going to realize data as communication standard embeds, promptly, also can not produce the data embedded technology of tonequality deterioration even hope is decoded in terminal under the situation of (speech regeneration), but just have the problem that to reply relevant requirement because the tonequality deterioration can take place in the prior art.

Summary of the invention

The objective of the invention is,, also can data be embedded into phonetic code, and can correctly extract these embedding data at demoder in encoder side even do not hold key mutually scrambler, demoder both sides.

Other purpose of the present invention is even data are embedded into phonetic code, also not have the tonequality deterioration, and make the taker of listening of reproduce voice be unaware of the data embedding.

Other purpose of the present invention is, makes the leakage that embeds data, distorts difficulty.

Other purpose of the present invention is, make it possible to embed data and control code the two, make decoder side carry out processing thus according to control code.

Other purpose of the present invention is to increase the transmission capacity that embeds data.

Other purpose of the present invention is that the multimedia that makes it possible to only use voice channel to carry out voice, image, personal information etc. transmits.

Other purpose of the present invention is any information of advertising message etc. can be offered the final user who carries out voice data communication mutually.

Other purpose of the present invention is, can be embedded in sender, recipient, the time of reception, talk classification etc. in the speech data that has received and stores.

The 1st form of implementation of the present invention is when embedding arbitrary data in phonetic code, to use the 1st element code and critical value in the element code that constitutes phonetic code to judge whether to satisfy data embedding condition; Under situation about satisfying, data are embedded into phonetic code by replacing the 2nd element code with data arbitrarily.Specifically be exactly, the 1st element code is a noise code book gain code, and the 2nd element code is the noise code as the index information of noise code book; When the re-quantization value of this noise code book gain code than above-mentioned critical value hour, be judged as and satisfy above-mentioned data and embed condition, and data are embedded into phonetic code by replacing above-mentioned noise code with data arbitrarily.As other object lesson be exactly, the 1st element code is the pitch gain code, and the 2nd element code is to delay code as the tone of the index information of adaptive codebook; When the re-quantization value of this pitch gain code than above-mentioned critical value hour, be judged as and satisfy above-mentioned data and embed condition, and delay code data are embedded into phonetic code by replace above-mentioned tone with data arbitrarily.

After the coded word of 2 kinds that are conceived to sound source signal, that is, after the fixed codeword (noise code) of the adaptive code word in diaphone tuning source and corresponding noise source of sound, just can think that gain is the factor of the contribution degree of each coded word of expression P, C.Just, the contribution degree of corresponding coded word is just little under the little situation of gain.Therefore, define gain as critical parameter, becoming under certain situation below the critical value, the contribution degree that is judged as corresponding source of sound coded word is little, and usefulness data sequence is arbitrarily replaced the index of this source of sound coded word.Thus, Yi Bian just can suppress the influence of replacement minutely, Yi Bian embed data arbitrarily.In addition, by the Control Critical value, while just can consider the embedding data volume is adjusted in the influence of regeneration tonequality.

The 2nd form of implementation of the present invention is, when extracting with the data in the coded phonetic code of predetermined voice coding modes, use the 1st element code in the element code of above-mentioned formation phonetic code and critical value to judge whether to satisfy data and embed condition being embedded in; Under situation about satisfying, being judged as to embed in the 2nd element code part of phonetic code has arbitrarily data and these embedding data is extracted.Specifically be exactly, the 1st element code is a noise code book gain code, and the 2nd element code is the noise code as the index information of noise code book; When the re-quantization value of this noise code book gain code than above-mentioned critical value hour, be judged as and satisfy above-mentioned data and embed condition, and extract from above-mentioned noise code and to embed data.As other object lesson be exactly, the 1st element code is the pitch gain code, and the 2nd element code is to delay code as the tone of the index information of adaptive codebook; When the re-quantization value of this pitch gain code than above-mentioned critical value hour, be judged as and satisfy above-mentioned data and embed condition, and delay code from above-mentioned tone and extract and embed data.

Handle as described above, also can data be embedded into phonetic code even do not hold key mutually, and can correctly extract these embedding data at demoder in encoder side scrambler, demoder both sides.In addition,, do not have the tonequality deterioration yet, and can make the taker of listening of reproduce voice be unaware of the data embedding even data are embedded into phonetic code.In addition, by the change critical value, just can make the embedding data leakage, distort difficulty.

The 3rd form of implementation of the present invention is, in the system of sound encoding device and audio decoding apparatus that purchases, sound encoding device is encoded to voice with predetermined voice coding modes and embed data arbitrarily in resulting phonetic code, and audio decoding apparatus is from these phonetic code extraction embedding data and from this phonetic code reproduce voice.In relevant system, define respectively in order to judge above-mentioned the 1st element code that in sound encoding device and audio decoding apparatus, whether embeds data and use and critical value and above-mentioned the 2nd element code that embeds data based on this result of determination.In relevant state, when sound encoding device embeds data, judge whether to satisfy data with the 1st element code in the element code of above-mentioned formation phonetic code and critical value and embed condition, if satisfy, just data are embedded into phonetic code by replacing the 2nd element code with data arbitrarily.On the other hand, when the speech regeneration device carries out data pick-up, use the 1st element code in the element code of above-mentioned formation phonetic code and critical value to judge whether to satisfy data and embed condition, under situation about satisfying, being judged as to embed in the 2nd element code of phonetic code part has arbitrarily data and these embeddings data is extracted, then to phonetic code execution decoding processing.

Handle as described above,, just can not use key to carry out embedding, the extraction of data if only define the initial value of critical value in advance at receiving-transmitting sides.In addition, if, just can use this control code to carry out the change of critical value, just can adjust the conveying capacity that embeds data by the change of this critical value to embedding the good control code of data definition.In addition, since according to yield value decide be only embed data sequence or with can recognition data and the form of the classification of control code embed data/control code sequence, under the situation that only embeds data sequence, just there is no need to comprise data category information, so can improve the transmission capacity.

The 4th form of implementation of the present invention is, a kind of digital voice communication system of voice being encoded and sending with predetermined voice coding modes, comprise: analyze the device that the input voice has been carried out the speech data of coding, according to this analysis result arbitrarily code be embedded into the device of specific part of the part of speech data, and the device that above-mentioned embedded data are sent as speech data; Send common voice call and additional information simultaneously.In addition, digital voice communication system, and then comprise: analyze the device of the speech data that has received, and the device that extracts code according to this analysis result from the specific part of the part of speech data; Receive common voice call and the additional information line output of going forward side by side simultaneously.

By select image information (image around the location, map image etc.), personal information (appearance photo, voiceprint, fingerprint) etc. for use as additional information, multimedia communication just becomes possibility.In addition, by select sequence numbering, the voiceprint of terminal for use as additional information, whether be proper user's authentication performance, and the security performance of raising speech data just becomes possibility if just can improve.

In addition, by the server unit of transfer speech data is set, just any information of advertising message etc. can be offered the final user who carries out voice data communication mutually.

In addition, by being embedded in sender, recipient, the time of reception, talk classification etc. in the speech data that has received and storing memory storage into, the documentation of speech data just becomes possibility, just can easily utilize from now on.

Other feature and advantage of the present invention are by following accompanying drawing and will be more clear based on the detailed description of accompanying drawing.

Description of drawings

Fig. 1 is the general pie graph in scrambler one side of the present invention.

Fig. 2 is the pie graph that embeds identifying unit.

Fig. 3 is to use the pie graph of the 1st embodiment of the situation of the scrambler of encoding according to coded system G.729.

Fig. 4 is the pie graph that embeds identifying unit.

Fig. 5 is the standard format of phonetic code.

Fig. 6 is the key diagram that utilizes the transmission code that embeds control.

Fig. 7 is the key diagram that distinguishes data and control code embed situation.

Fig. 8 is to use the pie graph of the 2nd embodiment of the situation of the scrambler of encoding according to coded system G.729.

Fig. 9 is the pie graph that embeds identifying unit.

Figure 10 is the standard format of phonetic code.

Figure 11 is the key diagram that utilizes the transmission code that embeds control.

Figure 12 is the general pie graph in demoder one side of the present invention.

Figure 13 is the pie graph that embeds identifying unit.

Figure 14 is the pie graph of the 1st embodiment of the situation of embedding data in noise code.

Figure 15 is the pie graph of the embedding identifying unit of the situation of embedding data in noise code.

Figure 16 is the tag format that receives phonetic code.

Figure 17 utilizes data to embed the result of determination key diagram of identifying unit.

Figure 18 is a pie graph of delaying the 2nd embodiment of the situation of embedding data in the code at tone.

Figure 19 is a pie graph of delaying the embedding identifying unit of the situation of embedding data in the code at tone.

Figure 20 is the tag format that receives phonetic code.

Figure 21 utilizes data to embed the result of determination key diagram of identifying unit.

Figure 22 is the pie graph of embodiment of having set scrambler one side of critical value multistagely.

Figure 23 can carry out the range specification figure that data embed.

Figure 24 is a pie graph of having set the embedding identifying unit of critical value situation multistagely.

Figure 25 is the key diagram that data embed.

Figure 26 is the pie graph of embodiment of having set demoder one side of critical value multistagely.

Figure 27 is the pie graph that embeds identifying unit.

Figure 28 is a pie graph of realizing transmitting simultaneously the digital voice communication system that the multimedia of voice and image transmits by embedded images.

Figure 29 is the transmission treatment scheme that image transmits the sending side terminal in the service.

Figure 30 is the reception treatment scheme that image transmits the receiving side terminal in the service.

Figure 31 is the pie graph that transmits the digital voice communication system of voice and authentication information by the embedding authentication information simultaneously.

Figure 32 is the transmission treatment scheme that authentication transmits the sending side terminal in the service.

Figure 33 is the reception treatment scheme that authentication transmits the receiving side terminal in the service.

Figure 34 is the pie graph that transmits the digital voice communication system of voice and key information during by the sunk key information of same.

Figure 35 is the pie graph that transmits the digital voice communication system of voice and IP phone address information by embedding IP phone address information simultaneously.

Figure 36 realizes that advertising message embeds the pie graph of the digital voice communication system of service.

Figure 37 is the formation example of the IP grouping in the Internet telephony service.

Figure 38 is that advertising message is inserted treatment scheme in service.

Figure 39 is the advertising message reception processing flow chart that embeds receiving side terminal in the service in advertising message.

Figure 40 is the pie graph of the information storage system of concerted action in digital voice communication system.

Figure 41 is that ITU-T advises the G.729 pie graph of the scrambler of mode.

Figure 42 is a key diagram of distributing to the sampled point of each pulse level group.

Figure 43 is the block diagram of the demoder of mode G.729.

Figure 44 is existing eletric watermark technical descriptioon figure.

Figure 45 is another key diagram of existing eletric watermark technology.

Embodiment

[working of an invention form]

(A) principle of the present invention

In the demoder of CELP algorithm, index and gain information by the designated tone source sequence generate sound source signal, use the composite filter that is made of linear predictor coefficient to generate (regeneration) voice, and reproduce voice is showed by following formula.

S _rp＝H·R＝H(G _p·P+G _c·C)＝H·G _p·P+H·G _c·C

Here, S _RpBe reproduce voice, H is the LPC composite filter, G _pBe adaptive code word gain (pitch gain), P is adaptive code word (tone is delayed code), G _cBe noise code word gain (gain of noise code book), C is the noise code word.In addition, the 1st on the right is the pitch period composite signal, and the 2nd is the noise composite signal.

As mentioned above, corresponding by the coded digital code of CELP (passing a parameter) with the characteristic parameter of speech production system.After being conceived to this feature, just can hold the state that each passes a parameter.For example, after the coded word of 2 kinds that are conceived to sound source signal, that is, gain G just can be thought in the adaptive code word in diaphone tuning source and the noise code word of corresponding noise source of sound _p, G _cIt is the factor of the contribution degree of each coded word of expression P, C.Just, in gain G _p, G _cThe contribution degree of corresponding coded word P, C is just little under the little situation.Therefore, define gain G as critical parameter _p, G _c, becoming under certain situation below the critical value, the contribution degree that is judged as corresponding source of sound coded word P, C is little, and usefulness data sequence is arbitrarily replaced the index of source of sound coded word.Thus, Yi Bian just can suppress the influence of replacement minutely, Yi Bian embed data arbitrarily.In addition, by the Control Critical value, while just can consider the embedding data volume is adjusted in the influence of regeneration tonequality.

Present technique, if only define the initial value of critical value in advance at receiving-transmitting sides, only by critical parameter (pitch gain, noise code book gain) and embedded object parameter (tone is delayed, noise code), embed having or not and embedded location of data, and embed writing/read and just becoming possibility of data.That is, the transmission of specific key does not just need.In addition, if to embedding the good control code of data definition, only by just adjusting the conveying capacity that embeds data with the change of control code indication critical value.

Like this, by being suitable for of present technique, just can coded format not carry out the embedding of arbitrary data with changing.Just, can not damage interchangeability necessary in the purposes of communication/storage ground, and the information of ID or other medium is embedded into voice messaging does not carry out transmitted/stored with notifying to the user.Add, in the present invention owing to, be applicable to mode widely by the common parameter of CELP is come the regulation control method with just can being not limited to specific mode.For example, also can be corresponding at VoIP G.729 or at the AMR of mobile communication etc.

(B) scrambler one side's embodiment

(a) general formation

Fig. 1 is the general pie graph in scrambler one side of the present invention.Voice/sound CODEC (scrambler) 51 encodes to the input voice according to predetermined encoding, and exports resulting phonetic code (code data).Phonetic code is made of a plurality of element codes.Embed data generating unit 52, produce the predetermined data that are used to be embedded into phonetic code.Data embed control module 53, and the data that embed identifying unit 54 and selector structure of purchasing embed unit 55, aptly data are embedded into phonetic code.Embedding identifying unit 54 uses the 1st element code and critical value TH in the element code that constitutes phonetic code to judge that whether satisfying data embeds condition, data embed unit 55, under the situation that satisfies data embedding condition, by replacing the 2nd element code data be embedded into phonetic code with embedding data arbitrarily, under the situation that does not satisfy data embedding condition, intactly export the 2nd element code.Multichannel unit 56, multiplex constitutes each element code of phonetic code.

Fig. 2 is the pie graph that embeds identifying unit, and inverse quantization unit 54a carries out re-quantization to the 1st element code and exports re-quantization value G, critical value generating unit 54b output critical value TH.Comparing unit 54c compares re-quantization value G and critical value TH, and comparative result is input to data embedding identifying unit 54d.Data embed identifying unit 54d, if for example G 〉=TH just is judged to be and can not carries out the data embedding, and produce the selection signal SL be used to select from the 2nd element code of scrambler 51 outputs, if G＜TH just is judged to be and can carries out data and embed, and produce the selection signal SL that is used to select from the embedding data that embed 52 outputs of data generating unit.This result is that data embed unit 55 based on selecting signal SL to export the 2nd element code selectively and embedding in the data one.

In addition, in Fig. 2, the 1st element code is carried out re-quantization and compare, but also have, under relevant situation, just may not carry out re-quantization by set the situation that critical value can compare with grade code with code with critical value.

(b) the 1st embodiment

Fig. 3 is to use the pie graph of the 1st embodiment of the situation of the scrambler of encoding according to coded system G.729, to the part additional phase identical with Fig. 1 with mark.With the difference of Fig. 1 be, use gain code (gain of noise code book), as the noise code this point of the 2nd element code use as the index of noise code book as the 1st element code.

G.729, scrambler 51 is according to encoding to the input voice, and resulting phonetic code is input to data embeds unit 53.G.729 phonetic code is as shown in table 2, has LSP code, adaptive codebook index (tone is delayed code), noise code book index (noise code), gain code as element code.Gain code is the code that gain is made up and encoded to pitch gain and noise code.

[table 2]

Table 1 ITU-T each several part G.729

Bit rate	????8kbit/s
Bit rate	????8kbit/s	Frame length	????10ms
Subframe is long	????5ms	Frame length	????10ms

Pass a parameter and the transmission capacity
Pass a parameter and the transmission capacity		???LSP	??18bit/10ms
The adaptive codebook index	??13bit/10ms	???LSP	??18bit/10ms
The adaptive codebook index	??13bit/10ms	The noise code book index	??17bit/5ms
Gain (self-adaptation/noise code book)	??7bit/5ms	The noise code book index	??17bit/5ms

Data embed the embedding identifying unit 54 of unit 53, use the re-quantization value and the critical value TH of gain code to judge whether to satisfy data embedding condition, data embed unit 55, under the situation that satisfies data embedding condition, by replacing noise code data are embedded into phonetic code with predetermined data, do not satisfying under the situation that data embed condition, intactly the output noise code.Multichannel unit 56, multiplex constitutes each element code of phonetic code.

Embed identifying unit 54 structure shown in Figure 4 of purchasing, inverse quantization unit 54a carries out re-quantization to gain code, and comparing unit 54c is to re-quantization value (gain of noise code book) G _cTH compares with critical value, and data embed identifying unit 54d, at re-quantization value G _cThan critical value TH hour, just be judged to be and satisfy data and embed condition, and produce the selection signal SL that is used to select from the embedding data that embed 52 outputs of data generating unit.In addition, data embed identifying unit 54d, at re-quantization value G _cWhen bigger, just be judged to be and do not satisfy data and embed condition, and produce the selection signal SL that is used to select from the noise code of scrambler 51 outputs than critical value TH.Data embed unit 55 based on selecting signal SL output noise code and embed in the data one selectively.

Fig. 5 is the standard format of phonetic code, Fig. 6 is the key diagram that utilizes the transmission code that embeds control, and the situation that phonetic code is made of 5 codes (LSP code, adaptive codebook index, adaptive codebook gain, noise code book index, the gain of noise code book) is shown.In noise code book gain G _cUnder the big situation of critical value TH, shown in Fig. 6 (1), in phonetic code, do not embed data like that.But, in noise code book gain G _cUnder the little situation of critical value TH, shown in Fig. 6 (2), in the noise code book index part of phonetic code, embed data like that.

The example of Fig. 6 is, data are embedded into the example that is used for the whole situation in thin index (noise code) M (=17) of noise code position arbitrarily, but, just data and control code difference can be embedded into remaining (M-1) position as shown in Figure 7 by highest significant position (MSB) is made as the data category position.Like this, by define the position of recognition data/control code in a part that embeds data, using control code to carry out the change of critical value, synchro control etc. just becomes possibility.

Shown in the table 3, in voice coding modes G.729, under certain situation below the yield value, replace analog result as the situation of the noise code (17) of noise code book index with data arbitrarily.If data are the data that produce at random arbitrarily, with SNR the result who estimates have been carried out in the variation of the tonequality under the situation of this random data being regarded as noise code and having been regenerated, and measured the ratio of replacing frame with data.In addition, the critical value in the table is the gain index numbering, and numeral wonderful works critical value more must gain just big more.In addition, SNR be without data replace under the noise code situation in the phonetic code sound source signal and as the ratio (dB) of the situation of not replacing with the error signal of the difference of the sound source signal of the situation of having replaced.SNR _SegBe the SNR of each frame, SNR _TotIt is average SNR all between speech region.Ratio (%) is that during as voice signal input standard signal, gain becomes the ratio that the following data of corresponding critical value are embedded into.

[table 3]

Critical value (gain index), to the ratio of the influence of tonequality and change frame

Critical value	SNR _seg?[dB]	?SNR _tot?[dB]	Ratio [%]	Critical value	SNR _seg?[dB]	?SNR _tot?[dB]	Ratio [%]
Critical value	SNR _seg?[dB]	?SNR _tot?[dB]	Ratio [%]	Critical value	SNR _seg?[dB]	?SNR _tot?[dB]	Ratio [%]	?0	?11.60	?13.27	?0	?18	?11.44	?13.21	?45.09
?2	?11.59	?13.27	?11.22	?20	?11.40	?13.20	?45.59	?0	?11.60	?13.27	?0	?18	?11.44	?13.21	?45.09
?2	?11.59	?13.27	?11.22	?20	?11.40	?13.20	?45.59	?4	?11.58	?13.24	?31.90	?30	?11.32	?13.21	?47.63
?6	?11.56	?13.24	?37.68	?40	?11.16	?13.22	?49.34	?4	?11.58	?13.24	?31.90	?30	?11.32	?13.21	?47.63
?6	?11.56	?13.24	?37.68	?40	?11.16	?13.22	?49.34	?8	?11.53	?13.25	?40.37	?50	?11.03	?13.18	?50.66
?10	?11.52	?13.26	?41.88	?60	?10.86	?13.13	?52.04	?8	?11.53	?13.25	?40.37	?50	?11.03	?13.18	?50.66
?10	?11.52	?13.26	?41.88	?60	?10.86	?13.13	?52.04	?12	?11.50	?13.24	?42.96	?80	?10.56	?13.10	?54.24
?14	?11.47	?13.22	?43.87	?100	?10.16	?12.96	?56.35	?12	?11.50	?13.24	?42.96	?80	?10.56	?13.10	?54.24
?14	?11.47	?13.22	?43.87	?100	?10.16	?12.96	?56.35	?16	?11.44	?13.20	?44.51

As table 3, be 12 for example by critical value setting with the gain of noise code book, just can with data arbitrarily replace noise code book indexes (noise code) total transmission capacity 43%, and, promptly use demoder intactly to decode, also can suppress for compare (=11.60-11.50) tonequality poor of 0.1dB only with the situation that does not embed data (critical value is 0 situation).This just means does not have the tonequality deterioration in G.729, in fact mean the transmission of the arbitrary data that can carry out 1462bits/s (=0.43 * 17 * (1000/5)).In addition,, this critical value reduces, while also can consider the transmission capacity (ratio) that embeds data is adjusted in the influence of tonequality by being increased.For example, if allow the tonequality conversion of 0.2dB, just can be by being that 20 increase transmission capacity are up to 46% (1546bits/s) with critical value setting.

(c) the 2nd embodiment

Fig. 8 is to use the pie graph of the 2nd embodiment of the situation of the scrambler of encoding according to coded system G.729, to the part additional phase identical with Fig. 1 with mark.With the difference of Fig. 1 be, use gain code (pitch gain code), use as the 2nd element code and delay the code this point as the tone of adaptive codebook index as the 1st element code.

G.729, scrambler 51 is according to encoding to the input voice, and resulting phonetic code is input to data embeds unit 53.Data embed the embedding identifying unit 54 of unit 53, use the re-quantization value (pitch gain) and the critical value TH of gain code to judge whether to satisfy data embedding condition, data embed unit 55, under the situation that satisfies data embedding condition, by delaying code data are embedded into phonetic code with predetermined data replacement tone, under the situation that does not satisfy data embedding condition, intactly export tone and delay code.Multichannel unit 56, multiplex constitutes each element code of phonetic code.

Embed identifying unit 54 structure shown in Figure 9 of purchasing, inverse quantization unit 54a carries out re-quantization to gain code, and comparing unit 54c is to re-quantization value (pitch gain) G _pTH compares with critical value, and data embed identifying unit 54d, at re-quantization value G _pThan critical value TH hour, just be judged to be and satisfy data and embed condition, and produce the selection signal SL that is used to select from the embedding data that embed 52 outputs of data generating unit.In addition, data embed identifying unit 54d, at re-quantization value G _pWhen bigger, just be judged to be and do not satisfy data embedding condition, and produce the selection signal SL that is used to select to delay code from the tone of scrambler 51 outputs than critical value TH.Data embed unit 55 and delay code and embed in the data one based on selecting signal SL to export tone selectively.

Figure 10 is the standard format of phonetic code, Figure 11 is the key diagram that utilizes the transmission code that embeds control, and the situation that phonetic code is made of 5 codes (LSP code, adaptive codebook index, adaptive codebook gain, noise code book index, the gain of noise code book) is shown.At pitch gain G _pUnder the big situation of critical value TH, shown in Figure 11 (1), in phonetic code, do not embed data like that.But, at pitch gain G _pUnder the little situation of critical value TH, shown in Figure 11 (2), in the adaptive codebook index part of phonetic code, embed data like that.

Shown in the table 4, in voice coding modes G.729, under certain situation below the yield value, replace as the tone of adaptive codebook index and delay the code (analog result of 13/10msec) situation with data arbitrarily.If data are the data that produce at random arbitrarily, shown in the table 4 with SNR to this random data is delayed code rewriting as tone situation under the variation of tonequality carried out result who estimates and the ratio of replacing frame.

[table 4]

Be the gain critical value of object with the adaptive codebook, to the influence of tonequality and the ratio of change frame

Critical value	?SNR _seg?[dB]	?SNR _tot?[dB]	Ratio [%]	Critical value	SNR _seg?[dB]	?SNR _tot?[dB]	Ratio [%]
Critical value	?SNR _seg?[dB]	?SNR _tot?[dB]	Ratio [%]	Critical value	SNR _seg?[dB]	?SNR _tot?[dB]	Ratio [%]	?0.0	?11.60	?13.27	?0	?0.7	?10.92	?12.69	?59.55
?0.1	?11.58	?13.22	?4.79	?0.8	?10.46	?12.01	?65.70	?0.0	?11.60	?13.27	?0	?0.7	?10.92	?12.69	?59.55
?0.1	?11.58	?13.22	?4.79	?0.8	?10.46	?12.01	?65.70	?0.2	?11.54	?13.23	?12.66	?0.9	?9.51	?10.30	?73.26
?0.3	?11.51	?13.22	?23.31	?1.0	?8.35	?8.70	?81.21	?0.2	?11.54	?13.23	?12.66	?0.9	?9.51	?10.30	?73.26
?0.3	?11.51	?13.22	?23.31	?1.0	?8.35	?8.70	?81.21	?0.4	?11.42	?13.15	?34.86	?1.1	?7.75	?7.92	?87.16
?0.5	?11.36	?13.15	?45.00	?1.2	?7.43	?7.56	?90.50	?0.4	?11.42	?13.15	?34.86	?1.1	?7.75	?7.92	?87.16
?0.5	?11.36	?13.15	?45.00	?1.2	?7.43	?7.56	?90.50	?0.6	?11.22	?13.04	?52.35

As table 4, illustrate for example when critical value setting is gain 0.5, just can replace 45% of total transmission capacity of delaying as the tone of adaptive codebook index with data arbitrarily, and, promptly use demoder intactly to decode, also can suppress (the situation of the difference of=11.60-11.36) tonequality for 0.24dB only.

(C) demoder one side's embodiment

(a) general formation

Figure 12 is the general pie graph in demoder one side of the present invention.If separative element 61 receives phonetic code, just this phonetic code is separated into element code and is input to data pick-up unit 62.Data pick-up unit 62 intactly is input to voice/sound CODEC (demoder) 64 in the 1st element code extracted data from the element code that is separated and when being input to data processing unit 63 with each element code of being imported.Phonetic code that demoder 64 decodings are imported and regeneration output voice.

Data pick-up unit 62, having purchased embeds identifying unit 65 and allocation units 66, aptly from the phonetic code extracted data.Embed identifying unit 65, use the 1st element code and critical value TH in the element code that constitutes phonetic code to judge whether to satisfy data embedding condition, embed condition if satisfy data, allocation units 66 are regarded the 2nd element code in the element code as embed code extracts, and should embed data and send to data processing unit 63.In addition, allocation units 66 with whether satisfy data and embed condition it doesn't matter that ground intactly is input to scrambler 64 with the 2nd element code of being imported.

Figure 13 is the pie graph that embeds identifying unit, and inverse quantization unit 65a carries out re-quantization to the 1st element code and exports re-quantization value G, critical value generating unit 65b output critical value TH.Comparing unit 65c is re-quantization value G and critical value TH relatively, and comparative result is input to data embedding identifying unit 65d.Data embed identifying unit 65d, do not embed data if G 〉=TH just is judged to be, if G＜TH just is judged to be the embedding data, and produce distributing signal BL.Allocation units 66 are based on distributing signal BL, if embed data just when extracting these data from the 2nd element code and being input to data processing unit 63, the 2nd element code intactly are input to demoder 64.In addition, if do not embed data, allocation units 66 just intactly are input to demoder 64 with the 2nd element code.In addition, in Figure 13, the 1st element code is carried out re-quantization and compare, but also have, under relevant situation, just may not carry out re-quantization by set the situation that critical value can compare with grade code with code with critical value.

(b) the 1st embodiment

Figure 14 is the pie graph of the 1st embodiment of the situation of embedding data in noise code G.729, to the part additional phase identical with Figure 12 mark together.With the difference of Figure 12 be, use gain code (gain of noise code book), as the noise code this point of the 2nd element code use as the index of noise code book as the 1st element code.

If separative element 61 receives phonetic code, just this phonetic code is separated into element code and is input to data pick-up unit 62.If according to G.729 encoding, separative element 61 just is separated into phonetic code the LSP code, tone is delayed code, noise code and gain code and is input to data pick-up unit 62.In addition, gain code is combination tone gain and the gain of noise code book, and uses quantization table to carry out quantizing the code of (coding).

The embedding identifying unit 65 of data pick-up unit 62, use the re-quantization value and the critical value TH of gain code to judge whether to satisfy data embedding condition, embed condition if satisfy data, allocation units 66 are just regarded noise code as that embedding code extracts, and, this noise code is shone the former state of being imported be input to demoder 64 embedding when data are input to data processing unit 63.In addition, do not embed condition if do not satisfy data, allocation units 66 just shine the former state of being imported with this noise code and are input to demoder 64.

Embed identifying unit 65 formation shown in Figure 15 of purchasing, inverse quantization unit 65a carries out re-quantization to gain code, and comparing unit 65c is re-quantization value (gain of noise code book) G relatively _cWith critical value TH, data embed identifying unit 65d at re-quantization value G _cThan critical value TH hour, just being judged as to embed had data, at re-quantization value G _cWhen bigger, just be judged as and do not embed data, and produce distributing signal BL respectively than critical value TH.Allocation units 66 are input to data processing unit 63 based on distributing signal BL with the data that are embedded in the noise code.In addition, noise code is input to demoder 64.

Figure 16 is the standard format that receives phonetic code, Figure 17 utilizes data to embed the result of determination key diagram of identifying unit, and the situation that phonetic code is made of 5 codes (LSP code, adaptive codebook index, adaptive codebook gain, noise code book index, the gain of noise code book) is shown.When receiving, do not know in the noise code book index part (noise code part) of phonetic code whether embedded data (Figure 16).But, by judging noise code book gain G _cJudge whether that with the size of critical value TH embedding has data.That is, if noise code book gain G _cBigger than critical value TH, shown in Figure 17 (1), in noise code book index part, do not embed data like that.But, if noise code book gain G _cLittler than critical value TH, embedding in noise code book index part like that shown in Figure 17 (2) has data.

As shown in Figure 7 by highest significant position (MSB) is made as the data category position, if data and control code difference are embedded into remaining (M-1) position, data processing unit 63 is with reference to this highest significant position, if control code, just carry out and meet this process of commands, for example carry out the change, synchro control processing of critical value etc.

(b) the 2nd embodiment

Figure 18 is a pie graph of delaying the 2nd embodiment of the situation of embedding data in the code at tone G.729, to the part additional phase identical with Figure 12 mark together.With the difference of Figure 12 be, use gain code (pitch gain code) as the 1st element code, use is delayed the code this point as the tone of the index of adaptive codebook as the 2nd element code.

The embedding identifying unit 65 of data pick-up unit 62, use the re-quantization value and the critical value TH of gain code to judge whether to satisfy data embedding condition, embed condition if satisfy data, allocation units 66 are just delayed tone code and are regarded that embedding code extracts as, and should embed when data are input to data processing unit 63, this tone is delayed code shine the former state of being imported and be input to demoder 64.In addition, embed condition if do not satisfy data, allocation units 66 are just delayed this tone code and are shone the former state of being imported and be input to demoder 64.

Embed identifying unit 65 formation shown in Figure 19 of purchasing, inverse quantization unit 65a carries out re-quantization to gain code, and comparing unit 65c is re-quantization value (pitch gain) G relatively _pWith critical value TH, data embed identifying unit 65d at re-quantization value G _pThan critical value TH hour, just being judged as to embed had data, at re-quantization value G _pWhen bigger, just be judged as and do not embed data, and produce distributing signal BL respectively than critical value TH.Allocation units 66 are based on distributing signal BL, are input to data processing unit 63 with being embedded in the data that tone delays in the code.In addition, tone is delayed code and be input to demoder 64.

Figure 20 is the standard format that receives phonetic code, Figure 21 utilizes data to embed the result of determination key diagram of identifying unit, and the situation that phonetic code is made of 5 codes (LSP code, adaptive codebook index, adaptive codebook gain, noise code book index, the gain of noise code book) is shown.When receiving, do not know in the adaptive codebook index part (tone is delayed code section) of phonetic code whether embedded data (Figure 20).But, by judging adaptive codebook gain G _pJudge whether that with the size of critical value TH embedding has data.That is, if adaptive codebook gain G _pBigger than critical value TH, shown in Figure 21 (1), in the adaptive codebook index part, do not embed data like that.But, if adaptive codebook gain G _pLittler than critical value TH, embedding in noise code book index part like that shown in Figure 21 (2) has data arbitrarily.

(D) set the embodiment of critical value multistagely

(a) scrambler one side's embodiment

Figure 22 is the pie graph of embodiment of having set scrambler one side of critical value multistagely, to the part additional phase identical with Fig. 1 with mark.Difference is, 1. be set with 2 critical value this point, 2. according to the size of the re-quantization value of the 1st element code, decision is only to embed data sequence, or embed and to have the data/control code sequence this point of data category position, 3. embed the data this point based on this decision.

Voice/sound CODEC (scrambler) 51 is according to predetermined encoding, and for example G.729 coding is imported voice, and exports resulting phonetic code (code data).Phonetic code is made of a plurality of element codes.Embed data generating unit 52, produce the data sequence of 2 kinds that are embedded into phonetic code.The 1st data sequence is, the data sequence that only forms for example by media data, the 2nd data sequence is the data/control code sequence with data category position shown in Figure 7, just can mix by " 1 ", " 0 " of data category position and have media codes and control code.

Data embed control module 53, and the data that embed identifying unit 54 and selector structure of purchasing embed unit 55, aptly media data or control code are embedded into phonetic code.Embedding identifying unit 54 uses the 1st element code and critical value TH1, the TH2 (TH2＞TH1) judge that whether satisfying data embeds condition in the element code that constitutes phonetic codes, simultaneously under situation about satisfying, judgement is the embedding condition that only satisfies the data sequence that is formed by media data, still satisfied embedding condition with data/control code sequence of data category position shown in Figure 7.For example, the re-quantization value G of such as shown in figure 23 the 1st element code, if 1. TH2＜G just is judged to be and does not satisfy data embedding condition, if 2. TH1≤G＜TH2 just is judged to be the embedding condition that satisfies the data/control code sequence with data category position, if 3. G＜TH1 just is judged to be the embedding condition that only satisfies the data sequence that is formed by media data.

Data embed unit 55, if 1. TH1≤G＜TH2, just by data being embedded into phonetic code to replace the 2nd element code from the data with the data category position/control code sequence that embeds 52 generations of data generating unit, if 2. G＜TH1, if just, just intactly export the 2nd element code by data being embedded into 3. TH2＜G of phonetic code to replace the 2nd element code from the media data sequence that embeds 52 generations of data generating unit.Multichannel unit 56, multiplex constitutes each element code of phonetic code.

Figure 24 is the pie graph that embeds identifying unit 54, and inverse quantization unit 54a carries out re-quantization to the 1st element code and exports re-quantization value G, and critical value generating unit 54b exports the 1st, the 2nd critical value TH1, TH2.Comparing unit 54c compares re-quantization value G and critical value TH1, TH2, and comparative result is input to data embedding identifying unit 54d.Data embed identifying unit 54d, according to being 1. TH2＜G, and still 2. TH1≤G＜TH2, still 3. G＜TH1 output preset selection signal SL.This result is that data embed unit 55 based on selecting signal SL to select and exporting the 2nd element code, have the data/control code sequence of data category position and any one in the media data sequence.

Adopting as scrambler under the situation of the scrambler of coded system G.729, the value that meets above-mentioned the 1st element code is gain of noise code book or pitch gain, and the 2nd element code is that noise code or tone are delayed code.

Figure 25 gets noise code book gain G _pBe the value of the re-quantization value that meets the 1st element code, the data of getting noise code and be the situation of the 2nd element code certificate embed key diagram, if G _p＜TH1, just the arbitrary data with media data etc. all is embedded into 17 noise code part.In addition, if TH1≤G _p＜TH2 just establishes highest significant position and embeds control code for " 1 " in remaining 16, establishes highest significant position and embeds data arbitrarily for " 0 " in remaining 16.

(b) demoder one side's embodiment

Figure 26 is the pie graph of embodiment of having set demoder one side of critical value multistagely, to the part additional phase identical with Figure 12 with mark.Difference is, 1. be set with 2 critical value this point, 2. according to the size of the re-quantization value of the 1st element code, decision is only to embed data sequence, or embed and to have the data/control code sequence this point of data category position, 3. judge and come the distribute data this point based on this.

If separative element 61 receives phonetic code, just this phonetic code is separated into element code and is input to data pick-up unit 62.Data pick-up unit 62 intactly is input to voice/sound CODEC (demoder) 64 in the 1st element code extracted data sequence from the element code that is separated or data/control code sequence and when being input to data processing unit 63 with each element code of being imported.Phonetic code that demoder 64 decodings are imported and regeneration output voice.

Data pick-up unit 62, having purchased embeds identifying unit 65 and allocation units 66, aptly from phonetic code extracted data sequence or data/control code sequence.Embed identifying unit 65, use meets the value of the 1st element code in the element code that constitutes phonetic code and critical value TH1, TH2 shown in Figure 23 (TH2＞TH1) judge whether to satisfy data to embed condition, simultaneously under situation about satisfying, judgement is the embedding condition that only satisfies the data sequence that is formed by media data, still satisfied embedding condition with data/control code sequence of data category position.For example, the re-quantization value G of the 1st element code, if 1. TH2＜G just is judged to be and does not satisfy data embedding condition, if 2. TH1≤G＜TH2 just is judged to be the embedding condition that satisfies the data/control code sequence with data category position, if 3. G＜TH1 just is judged to be the embedding condition that only satisfies the data sequence that is formed by media data.

Allocation units 66, if 1. TH1≤G＜TH2, just the 2nd element code is regarded as the data/control code sequence with data category position is input to data processing unit 63, simultaneously the 2nd element code is input to demoder 64, if 2. G＜TH1, just the 2nd element code is regarded as the data sequence that is only formed by media data is input to data processing unit 63, simultaneously the 2nd element code is input to demoder 64, if and then 3. TH2＜G, just regard that data are not embedded in the 2nd element code as the 2nd element code is input to demoder 64.

Figure 27 is the pie graph that embeds identifying unit 65, and inverse quantization unit 65a carries out re-quantization to the 1st element code and exports re-quantization value G, and critical value generating unit 65b exports the 1st, the 2nd critical value TH1, TH2.Comparing unit 65c compares re-quantization value G and critical value TH1, TH2, and comparative result is input to data embedding identifying unit 65d.Data embed identifying unit 65d, according to being 1. TH2＜G, and still 2. TH1≤G＜TH2, still 3. G＜predetermined distributing signal BL of TH1 output.This result is that allocation units 66 carry out above-mentioned distribution based on distributing signal BL.

Under the situation of accepting the sound code coded by coded system G.729, the value that meets above-mentioned the 1st element code is gain of noise code book or pitch gain, and the 2nd element code is that noise code or tone are delayed code.

Above just the voice communication system that phonetic code is sent to receiving trap from dispensing device, be suitable for situation of the present invention and be illustrated with demoder with scrambler.But, the present invention is not limited to relevant voice communication system, in other system, for example, with having the pen recorder encoded voice of scrambler and be recorded in the storage medium, from the recording/reproducing system of this storage medium reproduce voice etc., also can be suitable for regenerating unit with demoder.

(E) digital voice communication system

(a) realize that image transmits the system of service

Figure 28 is a pie graph of realizing transmitting simultaneously the digital voice communication system that the multimedia of voice and image transmits by embedded images, and the situation that is connecting terminal A100 and terminal B200 by common network 300 is shown.Each terminal A, B have identical formation.In terminal A100, voice coding unit 101, for example embed unit 103 according to mode G.729 to carrying out voice coding and be input to from the phonetic code of microphone MIC input, image data creating unit 102, the view data that generation should send also are input to and embed unit 103.Image data creating unit 102, compressed encoding is for example used the data of the appearance photo that does not have captured photo on every side of illustrated digital camera or user self and is kept in the storer, and the map image data around these view data or the speaker are encoded and are input to embeds unit 103.Embed unit 103 and be the data shown in the embodiment with Fig. 3 or Fig. 8 and embed control module 53 corresponding parts, the phonetic code data that view data are embedded into 101 inputs from the voice coding unit according to the embedding determinating reference identical line output of going forward side by side with this embodiment.Transmit processing unit 104 and send to distant terminal by the phonetic code data that common network 300 will embed view data.

The transmission processing unit 204 of distant terminal B200 receives the phonetic code data and is input to extracting unit 205 from common network 300.Extracting unit 205, be the data pick-up unit 62 corresponding parts shown in the embodiment with Figure 14 or Figure 18, according to the embedding determinating reference abstract image data identical and be input to image output unit 206 with this embodiment, in addition, the phonetic code data are input to tone decoding unit 207.Image output unit 206, the view data that decoding is imported generates image and is shown to display unit.Tone decoding unit 207, the phonetic code data that decoding is imported are exported by loudspeaker SP.

In addition, view data is embedded in the phonetic code data sends to terminal A, carry out the control of output image at terminal A similarly from terminal B.

Figure 29 is the transmission treatment scheme that image transmits the sending side terminal in the service.According to desirable coded system, for example G.729 the input voice are carried out voice coding and compress (step 1001), analyze the information (step 1002) in the coded speech frame, whether can embed (step 1003) based on the inspection of analyzing as a result, just view data is embedded into phonetic code data (step 1004) if can embed, transmission has embedded the phonetic code data (step 1005) of view data, and above-mentioned repeatedly action is up to being sent completely (step 1006).

Figure 30 is the reception treatment scheme that image transmits the receiving side terminal in the service.If receive phonetic code data (step 1101), analyze the information (step 1102) in the coded speech frame, whether embedded view data (step 1103) based on the inspection of analyzing as a result, if embedding is not just decoded to the phonetic code data and is exported reproduce voice (step 1104) by loudspeaker, on the other hand, if embed view data is arranged, just with the speech regeneration paralleling abstracting view data (step 1105) of step 1104, this view data of decoding is regenerated to image and is shown to display unit (step 1106).After this, above-mentioned repeatedly action is finished (step 1107) up to regeneration.

Above, utilize the digital voice communication system of Figure 28, just can transmit voice and additional information like that simultaneously according to common voice transportation protocol.In addition, because additional information is embedded in speech data the inside, acoustically overlapping not do not have the existence disturbed or the situation of abnormal sound yet.And by select image information (image around the location, map image etc.), personal information (appearance photo, fingerprint) etc. for use as additional information, multimedia communication just becomes possibility.

(b) realize that authentication information transmits the system of service

Figure 31 is the pie graph that transmits the digital voice communication system of voice and authentication information by the embedding authentication information simultaneously, to the part additional phase identical with Figure 28 mark together.Difference is that replacement image data creating unit 102,202 is provided with verify data generation unit 111,211 this point, replaces image output unit 106,206 authenticate-acknowledge unit 112,212 this point are set.Shown in Figure 31 as authentication information, the situation of embedding voiceprint.Verify data generation unit 111 uses vocoded data or the former speech data before data embed to generate and preserve voiceprint information.Then this voiceprint information is embedded into vocoded data and sends.More whether authenticate-acknowledge unit 112,212 extracts the voiceprint information in the take over party, by being that my voiceprint of logining in advance authenticates, if I just allow tone decoding.In addition, being not limited to voiceprint as authentication information, also can be the unique code (sequence numbering) of terminal, the perhaps unique code believed of user oneself, or make up unique code of these two sign indicating numbers etc.

Figure 32 is the transmission treatment scheme that authentication transmits the sending side terminal in the service.According to desirable coded system, for example G.729 the input voice are carried out voice coding and compress (step 2001), analyze the information (step 2002) in the coded speech frame, whether can embed (step 2003) based on the inspection of analyzing as a result, if just can embed personal authentication's data are embedded into phonetic code data (step 2004), transmission has embedded the phonetic code data (step 2005) of verify data, and above-mentioned repeatedly action is up to being sent completely (step 2006).

Figure 33 is the reception treatment scheme that authentication transmits the receiving side terminal in the service.If receive phonetic code data (step 2101), analyze the information (step 2102) in the coded speech frame, whether embedded verify data (step 2103) based on the inspection of analyzing as a result, if embedding is not just decoded to the phonetic code data and is exported reproduce voice (step 2104) by loudspeaker, on the other hand, if embed verify data arranged, just, to carry out authentication processing (step 2106) with the speech regeneration paralleling abstracting verify data (step 2105) of step 2104.For example, NG, OK (step 2107) with my authentication information of login is in advance compared and differentiated authentication if authentication result is NG, promptly if not me, just end the decoding (regeneration, expansion) (step 2108) of vocoded data.If authentication result OK promptly if I just allow the decoding of phonetic code data, regenerates and by loudspeaker output (step 2104) to voice.After this, above-mentioned repeatedly action is up to be sent completely (step 2109) from the other side.

Above, utilize the digital voice communication system of Figure 31, just can transmit voice and additional information like that simultaneously according to common voice transportation protocol.In addition, because additional information is embedded in speech data the inside, acoustically overlapping not do not have the existence disturbed or the situation of abnormal sound yet.And by embedding authentication information as additional information, whether be proper user's authentication performance, and the security performance of raising speech data just becomes possibility if just can improve.

(c) realize that key information transmits the system of service

Figure 34 is the pie graph that transmits the digital voice communication system of voice and key information during by the sunk key information of same, to the part additional phase identical with Figure 28 with mark.Difference is that replacement image data creating unit 102,202 is provided with key generation unit 121,221 this point, replaces image output unit 106,206 key authentication ' unit 122,222 this point are set.Key generation unit 121 is saved in built-in storage unit in advance with predefined key information.Then, embed unit 103, the phonetic code data that will be embedded into 101 inputs from the key information of key generation unit 121 inputs according to the embedding determinating reference identical line output of going forward side by side from the voice coding unit with the embodiment of Fig. 3 or Fig. 8.Transmit processing unit 104 and send to distant terminal by the phonetic code data that common network 300 will embed key information.

The transmission processing unit 204 of distant terminal B200 receives the phonetic code data and is input to extracting unit 205 from common network 300.Extracting unit 205 extracts key data and is input to key comparing unit 222 according to the embedding determinating reference identical with the embodiment of Figure 14 or Figure 18, simultaneously the phonetic code data is input to tone decoding unit 207.Whether key comparing unit 222 is that the key of logining in advance authenticates by the information of relatively being imported, if the consistent tone decoding that just allows of key information.If the inconsistent tone decoding of just forbidding.Do as described above, only just can accomplish regeneration from specific user's speech data.

(d) realize the system that the transmission of IP phone address is served

Figure 35 is the pie graph that transmits the digital voice communication system of voice and IP phone address information by embedding IP phone address information simultaneously, to the part additional phase identical with Figure 28 mark together.Difference is, replace image data creating unit 102,202 IP phone address input unit 131,231 this point are set, replace image output unit 106,206 IP phone storage unit 132,232 this point are set, and demonstration/key unit DPK this point is set.

Predefined IP phone address is kept in the built-in storage unit of IP phone address input unit 131.This IP phone address also can be the IP phone address of terminal A, also can be other facilities beyond the terminal A, the telephone number of other websites.Embed unit 103, the phonetic code data that will be embedded into 101 inputs from the IP phone address of IP phone address input unit 131 inputs according to the embedding determinating reference identical line output of going forward side by side from the voice coding unit with the embodiment of Fig. 3 or Fig. 8.Transmit processing unit 104 and send to distant terminal by the phonetic code data that common network 300 will embed the IP phone address.

The transmission processing unit 204 of distant terminal B200 receives the phonetic code data and is input to extracting unit 205 from common network 300.Extracting unit 205 extracts the IP phone address and is input to IP phone address storaging unit 232 according to the embedding determinating reference identical with the embodiment of Figure 14 or Figure 18, in addition the phonetic code data is input to tone decoding unit 207.The IP phone address that 232 storages of IP phone address storaging unit are imported.

Because showing, demonstration/key unit DPK is stored in the IP phone address of IP phone address storaging unit 232, so can select this IP phone address to make a phone call by clicking.

(e) realize that ad data embeds the system of service

Figure 36 realizes that advertising message embeds the pie graph of the digital voice communication system of service, server (gateway) is set, in this server,, directly advertising message is offered the final user who communicates mutually by advertising message is embedded into vocoded data.In Figure 36 to the part additional phase identical with Figure 28 with mark.With the difference of Figure 28 be, 1. removed image data creating unit 102,202 and embedded unit 103,203 this point from terminal 100,200,2. replace image output unit 106,206 advertising message regeneration unit 142,242 this point are set, 3. demonstration/key unit DPK this point is set, and 4. server (gateway) 400 this point that the speech data of terminal room are transmitted to common network 300 are set.

In server 400, bit stream decomposition/generation unit 401, cut out the transmission grouping from input from the bit stream of sending side terminal 100, and determine sender, addresser from the IP title of this grouping, determine media categories and coded system and differentiate whether satisfy advertising message and insert condition from the RTP title in addition, just will transmit the phonetic code data of dividing into groups and be input to embedding unit 402 if satisfy based on these information.Embed unit 402, according to the embedding determinating reference identical with the embodiment of Fig. 3 or Fig. 8, differentiation could embed, if can embed, just the advertiser is provided in addition the advertising message that is kept in the storer 403 to be embedded into the phonetic code data and to be input to bit stream decomposition/generation unit 401.Bit stream decomposition/generation unit 401 uses these phonetic code data to generate and transmits grouping and send to receiving side terminal B200.

The transmission processing unit 204 of receiving side terminal B200 receives the phonetic code data and is input to extracting unit 205 from common network 300.Extracting unit 205 extracts advertising message and is input to advertising message regeneration unit 242 according to the embedding determinating reference identical with the embodiment of Figure 14 or Figure 18, in addition the phonetic code data is input to tone decoding unit 207.The advertising message that 242 regeneration of advertising message regeneration unit are imported also is shown to demonstration/key unit DPK, and the 207 pairs of voice in tone decoding unit are regenerated and exported by loudspeaker SP.

Figure 37 is the formation example of the IP grouping in the Internet telephony service, title is made of IP title, UDP (User Datagram Protocol) title, RTP (Real-time TransportProtocol) title, comprising in the IP title does not have illustrated posting a letter to visit address, sender address, determines media categories and CODEC classification by the Payload classification PT of RTP title.Therefore, bit stream decomposition/generation unit 401 by with reference to the title that transmits grouping, just can be discerned sender, recipient, media categories and coded system.

Figure 38 is that advertising message is inserted treatment scheme in service.

Server 400 just transmits the title of grouping and the analysis (step 3001) of coded voice data after the bit stream input.Promptly, transmit grouping (step 3001a) from the bit stream intercepting, extract transmission address, receiver address (step 3001b) from the IP title, check the sender, whether the recipient has signed advertisement that contract (step 3001c) is provided, and provides contract just to discern media categories and CODEC classification (step 3001d) with reference to the RTP title if signed advertisement.Here for example media categories to be voice, CODEC classification be G.729 (step 3001e), then according to the embedding determinating reference identical with the embodiment of Fig. 3 or Fig. 8, differentiate and whether can embed (step 3001f), be made as and embed, can not embed (step 3001g, step 3001h) according to differentiating the result.In addition, not signing advertisement in step 3001c provides under the situation of contract, is not under the situation of voice at step 3001e media categories or the COEDC classification is not under G.729 the situation, to be made as and can not to embed (step 3001h).

Then, if server 400 can embed (step 3002), just the ad data that advertiser (informant) is provided is embedded into phonetic code (step 3003), if can not embed the terminal (step 3004) that does not just send to the take over party with ad data being embedded into the phonetic code data, after this, above-mentioned repeatedly action is up to being sent completely (step 3005).

Figure 39 is the advertising message reception processing flow chart that embeds receiving side terminal in the service in advertising message.If receive phonetic code data (step 3101), analyze the information (step 3102) in the coded speech frame, whether embedded advertising message (step 3103) based on the inspection of analyzing as a result, if embedding is not just decoded to the phonetic code data and is exported reproduce voice (step 3104) by loudspeaker, on the other hand, if embed advertising message is arranged, just with the speech regeneration paralleling abstracting advertising message (step 3105) of step 3104, with this advertising message demonstration/key unit DPK (step 3106).After this, above-mentioned repeatedly action is finished (step 3107) up to regeneration.

In addition,, be not limited to advertising message, can embed information arbitrarily although be illustrated with regard to the situation that embeds advertising message in an embodiment.In addition, just can make a phone call and import detailed advertising message or other details constitute like this by inserting advertising message and IP phone address simultaneously by clicking to this IP phone address party.

More than utilize the digital speech communication system of Figure 36, the server unit of transfer speech data is set, just can any information of advertising message etc. be offered the final user who carries out voice data communication mutually by this server.

(f) information storage system

Figure 40 is the pie graph of the information storage system of concerted action in digital voice communication system, and the state by common network 300 connecting terminal A100 and center 500 is shown.Center 500 is as for example call center of enterprise, is the facility of accepting complaints or discussing maintenance, replying from other users' requirement.The 101 pairs of voice of importing from microphone MIC in voice coding unit are encoded and are passed through transmission processing unit 104 and send networks 300 in terminal A100, and the 107 pairs of phonetic code data of importing from network 300 by transmission processing unit 104 in tone decoding unit are decoded and exported reproduce voice from loudspeaker SP simultaneously.Voice communication terminal side B possesses the structure same with terminal A in center 500.Promptly, the 501 pairs of voice of importing from microphone MIC in voice coding unit are encoded and are sent to network element 300 by transmitting processing unit, and the 507 pairs of phonetic code data of importing from network 300 by transmission processing unit 504 in tone decoding unit are decoded and exported reproduce voice from loudspeaker SP simultaneously.By above-mentioned structure, after calling from terminal A (user), the operator just replys this user.

500 the digital speech storage side at the center, additional data embeds unit 510 additional data is embedded into the vocoded data that sends from terminal A and stores units of speech data 520 into, additional data extracting unit 530 extracts the information that embeds by the predetermined vocoded data of being read from speech data storage unit 520, and be shown to the display unit of operating unit 540, simultaneously vocoded data is input to tone decoding unit 550, the 550 pairs of phonetic code data of being imported in tone decoding unit are decoded, and are exported by loudspeaker 560.

Embed in the unit 510 at additional data, additional data generating unit 511 will be encoded as additional data, and be input to embedding unit 512 from sender's name, recipient's name, the time of reception, talk classification (complaining, discuss, keep in repair classifications such as trust) that operating unit 540 is imported.Embed unit 512,, differentiate from the phonetic code data that terminal A100 sends, whether can embed additional information by transmitting processing unit 504 according to the embedding determinating reference identical with the embodiment of Fig. 3 or Fig. 8.To be embedded into the phonetic code data and store speech data storage unit 520 into from the information of the code of additional data generating unit 511 inputs if just can embed then as voice document.

Extracting unit 531 in additional extracting unit 530, according to the embedding determinating reference identical with the embodiment of Figure 14 or Figure 18, judge whether in the phonetic code data, to embed information is arranged, having under the situation of embedding, extract to embed code and be input to additional data and utilize unit 532, simultaneously the phonetic code data are input to tone decoding unit 550.Additional data utilizes the 532 pairs of codes that extracted in unit to decode and sender's name, recipient's name, the time of reception, talk classification etc. is shown to the display unit of operating unit 540.In addition, the 550 pairs of voice in tone decoding unit are regenerated and are exported by loudspeaker.

In addition, when speech data storage unit 520 is read the phonetic code data, can use embedded information to retrieve and export desirable phonetic code data.That is, import for example sender's name of search keys by operating unit 540, indication output embeds the voice document of this sender's name.Thus, extracting unit 531 retrievals embed the voice document of specifying sender's name to some extent, and output embedding information is input to tone decoding unit 550 with the phonetic code data simultaneously, and by loudspeaker output decoder voice.

Utilize the embodiment of above-mentioned Figure 40, with embedding phonetic code data such as sender, recipient, the time of reception, talk classification and store memory storage into, the phonetic code data that can read aptly and regenerate and stored just extract simultaneously and show embedding information.In addition, use to embed data and carry out the documentation of speech data and just become possibility, and will embed as search key and just can promptly retrieve the desirable voice document and the output of regenerating.

Utilize above-mentioned the present invention, also can data be embedded into phonetic code even do not hold key mutually, and can correctly extract these embedding data at demoder in encoder side scrambler, demoder both sides.

In addition, utilize the present invention,, also do not have the tonequality deterioration, and make the taker of listening of reproduce voice be unaware of the data embedding even data are embedded into phonetic code.

In addition, utilize the present invention,, the embedding of data, extract and just become possibility if only pre-define the initial value of critical value at receiving-transmitting sides.

In addition, utilize the present invention, just can carry out the change etc. of critical value, can adjust the transmission of additional information of other paths or the conveying capacity of embedding data by this control code if in embedding data, define control code.

In addition, utilize the present invention, since according to yield value decide be only embed data sequence or with can recognition data and the form of the classification of control code embed data/control code sequence, under the situation that only embeds data sequence, just there is no need to comprise data category information, so can improve the transmission capacity.

In addition, utilize the present invention, just can coded format not carry out the embedding of arbitrary data with changing.Just, can not damage interchangeability necessary in the purposes of communication/storage ground, and the information of ID or other medium is embedded into voice messaging does not carry out transmitted/stored with notifying to the user.Add, in the present invention owing to, be applicable to broad mode by the common parameter of CELP is come the regulation control method with just can being not limited to specific mode.For example, also can be corresponding at VoIP G.729 or at the AMR of mobile communication etc.

In addition, utilize data-voice communication system of the present invention, if code is embedded into the specific part of compressed voice data arbitrarily in transmitting terminal or path way, in receiving end or path way, extract the code that embeds from specific part, just can transmit voice and additional information like that simultaneously according to common voice transportation protocol by analyzing communicate voice data.In addition, because additional information is embedded in speech data the inside, acoustically overlapping not do not have the existence disturbed or the situation of abnormal sound yet.In addition, by select image information (image around the location, map image etc.), personal information (appearance photo, voiceprint, fingerprint) etc. for use as additional information, multimedia communication just becomes possibility.In addition, by select sequence numbering, the voiceprint of terminal for use as additional information, whether be proper user's authentication performance, and the security performance of raising speech data just becomes possibility if just can improve.

In addition, utilize the present invention,, just any information of advertising message etc. can be offered the final user who carries out voice data communication mutually by the server unit of transfer speech data is set.

In addition, utilize the present invention, by being embedded in sender, recipient, the time of reception, talk classification etc. in the speech data that has received and storing memory storage into, the documentation of speech data just becomes possibility, just can easily utilize from now on.

Owing to can realize many extensive significantly different forms of implementation of the present invention, be not limited to particular implementation form except that the right claim so be interpreted as the present invention without prejudice to the spirit and scope of the present invention.

Claims

1. data embedding grammar that embeds arbitrary data in the resulting phonetic code of voice being encoded with predetermined voice coding modes is characterized in that:

Use the 1st element code and critical value in the element code that constitutes phonetic code to judge whether to satisfy data embedding condition;

Under situation about satisfying, data are embedded into phonetic code by replacing the 2nd element code with data arbitrarily.

2. one kind to being embedded in the data pick-up method that extracts with the data in the coded phonetic code of predetermined voice coding modes, it is characterized in that:

Use the 1st element code and critical value in the element code that constitutes above-mentioned phonetic code to judge whether to satisfy data embedding condition;

Under situation about satisfying, being judged as to embed in the 2nd element code part of phonetic code has arbitrarily data and these embedding data is extracted.

3. in sound encoding device, voice are encoded for one kind with predetermined voice coding modes, in resulting phonetic code, embed simultaneously data arbitrarily, in the speech regeneration device, extract the embedding data from phonetic code, the data insertion/extraction method in the system of voice being regenerated by this phonetic code simultaneously is characterized in that:

In advance, in order to judge the 1st element code that whether embeds data and use and critical value and to define respectively based on the 2nd element code that this result of determination embeds data;

When data embed, use the 1st element code and critical value to judge whether to satisfy data and embed condition, under situation about satisfying, data are embedded into phonetic code by replacing the 2nd element code with data arbitrarily;

When data pick-up, use the 1st element code and critical value to judge whether to satisfy data and embed condition, under situation about satisfying, being judged as to embed in the 2nd element code part of phonetic code has arbitrary data and these embedding data is extracted.

4. embed or the data pick-up method as the described data of claim 1 to 3, it is characterized in that:

Re-quantization value and critical value to the 1st element code compare, and the result judges whether to satisfy data and embeds condition based on the comparison.

5. embed or the data pick-up method as the described data of claim 1 to 3, it is characterized in that:

The 1st above-mentioned element code is a noise code book gain code, and the 2nd element code is the noise code as the index information of noise code book;

When the re-quantization value of this noise code book gain code than above-mentioned critical value hour, be judged as and satisfy above-mentioned data embedding condition, and data are embedded into phonetic code by replacing above-mentioned noise code with data arbitrarily, perhaps being judged as to embed in above-mentioned noise code part has data arbitrarily, and these embedding data are extracted.

6. embed or the data pick-up method as the described data of claim 1 to 3, it is characterized in that:

The 1st above-mentioned element code is the pitch gain code, and the 2nd element code is to delay code as the tone of the index information of adaptive codebook;

When the re-quantization value of this pitch gain code than above-mentioned critical value hour, be judged as and satisfy above-mentioned data embedding condition, and delay code data are embedded into phonetic code by replace above-mentioned tone with data arbitrarily, perhaps being judged as to delay in the code section embedding at above-mentioned tone has data arbitrarily, and these embedding data are extracted.

7. embed or the data pick-up method as the described data of claim 1 to 3, it is characterized in that:

If the part of above-mentioned embedding data is the data category identifying information, determine to embed the classification of data by this data category identifying information.

8. embed or the data pick-up method as the described data of claim 1 to 3, it is characterized in that:

Set a plurality of above-mentioned critical values, based on the re-quantization value of the 1st element code or the 1st element code, distinguishing is all to be the data sequence that embeds data, still can recognition data and the data/control code sequence of the form of the classification of control code.

9. a data embedding apparatus that embeds arbitrary data in the resulting phonetic code of voice being encoded with predetermined voice coding modes is characterized in that, comprising:

Embed identifying unit, use the 1st element code and critical value in the element code that constitutes above-mentioned phonetic code to judge whether to satisfy data embedding condition;

Data embed the unit, are satisfying under the situation that data embed condition, by replacing the 2nd element code with data arbitrarily data are embedded into phonetic code.

10. one kind to being embedded in the data pick-up device that extracts with the data in the coded phonetic code of predetermined voice coding modes, it is characterized in that, comprising:

Separative element separates the element code that constitutes above-mentioned phonetic code;

Embed judging unit, use the 1st element code and critical value in the above-mentioned element code to judge whether to satisfy data embedding condition;

Embed the data pick-up unit, satisfying under the situation that data embed condition, being judged as to embed in the 2nd element code part of phonetic code has arbitrarily data and these embedding data is extracted.

11. with predetermined voice coding modes voice are encoded for one kind, in resulting phonetic code, embed simultaneously data arbitrarily, extract from this phonetic code and embed data, simultaneously the audio coding/decoding system of voice being regenerated by this phonetic code, it is characterized in that, comprising:

In the resulting phonetic code of voice being encoded with predetermined voice coding modes, embed the sound encoding device of data arbitrarily, and

To implementing decoding processing with the coded phonetic code of predetermined voice coding modes and voice being regenerated, the audio decoding apparatus that the data that are embedded in this phonetic code are extracted simultaneously;

Above-mentioned sound encoding device has been purchased,

Scrambler is encoded to voice with predetermined voice coding modes,

Embed identifying unit, use the 1st element code in the element code of above-mentioned formation phonetic code and critical value to judge whether to satisfy data and embed condition,

Data embed the unit, are satisfying under the situation that data embed condition, by replacing the 2nd element code with data arbitrarily data are embedded into phonetic code;

Above-mentioned audio decoding apparatus has been purchased,

Separative element is separated into element code with phonetic code,

Embed judging unit, use the 1st element code and the critical value that constitute in the element code that receives phonetic code to judge whether to satisfy data embedding condition,

Embed the data pick-up unit, satisfying under the situation that data embed condition, being judged as to embed in the 2nd element code part of phonetic code has arbitrarily data and these embedding data is extracted,

Demoder, phonetic code is decoded and reproduce voice to receiving;

In advance, define respectively in order to judge above-mentioned the 1st element code that in sound encoding device and audio decoding apparatus, whether embeds data and use and critical value and above-mentioned the 2nd element code that embeds data based on this result of determination.

12., it is characterized in that as claim 9 or 10 described data embedding apparatus or data pick-up device:

Above-mentioned embedding identifying unit has, and above-mentioned the 1st element code is carried out the inverse quantization unit of re-quantization;

To the comparing unit that compares by resulting re-quantization value of re-quantization and above-mentioned critical value;

The result judges whether satisfy the identifying unit that data embed condition based on the comparison.

13. data embedding apparatus as claimed in claim 12 or data pick-up device is characterized in that:

Above-mentioned embedding identifying unit, when the re-quantization value of this noise code book gain code than above-mentioned critical value hour, be judged as and satisfy above-mentioned data and embed condition.

14. data embedding apparatus as claimed in claim 12 or data pick-up device is characterized in that:

Above-mentioned embedding identifying unit, when the re-quantization value of this pitch gain code than above-mentioned critical value hour, be judged as and satisfy above-mentioned data and embed condition.

15. data embedding apparatus as claimed in claim 9 is characterized in that:

Produce a part and be the embedding data generating unit of embedding data of classification information of the classification of affirmation data.

16. data embedding apparatus as claimed in claim 9 is characterized in that:

Above-mentioned data embed the unit, based on above-mentioned or meet the value of the 1st element code, decision be with can recognition data and the form of the classification of control code embed data/control code sequence, or only embed data sequence.

17. the digital voice communication system of voice being encoded and sending with predetermined voice coding modes is characterized in that, comprising:

The device of the speech data after analysis is encoded to the input voice,

According to this analysis result arbitrarily code be embedded into the device of specific part of the part of speech data, and

The device that above-mentioned embedded data are sent as speech data;

Send common voice call and additional information simultaneously.

18. a reception is encoded to voice with predetermined voice coding modes and the digital voice communication system of the speech data that sent, it is characterized in that, comprising:

Analyze the device of the speech data that has received, and

Extract the device of code from the specific part of the part of speech data according to this analysis result;

Receive common voice call and additional information simultaneously.

19. the digital voice communication system of voice being encoded and receiving and dispatching with predetermined voice coding modes is characterized in that:

End device has been purchased, transmitting element and receiving element;

Above-mentioned transmitting element has been purchased,

The device of the data after analysis is encoded to the input voice,

The device that above-mentioned embedded data are sent as speech data;

Above-mentioned receiving element has been purchased,

Analyze the device of the speech data that has received, and

Between end device, pass through common voice call and the additional information of network two-way transmission simultaneously.

20. digital voice communication system as claimed in claim 19 is characterized in that:

Above-mentioned transmitting element has been purchased, the device that image that the use user terminal is preserved or personal information generate the code of above-mentioned embedding usefulness;

Above-mentioned receiving element has been purchased, and extracts and export the device of above-mentioned embedded code;

Can carry out multimedia with the form of voice call transmits.

21. digital voice communication system as claimed in claim 19 is characterized in that:

Above-mentioned transmitting element has been purchased, and terminal unique code that will use the transmit leg user or user's self unique code are as the device of the code of above-mentioned embedding usefulness;

Above-mentioned receiving element has been purchased, and extracts the device that embeds code and differentiate content;

22. digital voice communication system as claimed in claim 19 is characterized in that:

Above-mentioned transmitting element has been purchased, with the device of key information as the code of above-mentioned embedding usefulness;

Above-mentioned receiving element has been purchased, and extracts the device of this key information and uses this key information that extracts just to make the separating of speech data of having only the specific user and press to possible device.

23. digital voice communication system as claimed in claim 19 is characterized in that:

Above-mentioned transmitting element has been purchased, with the device of IP phone address information as the code of above-mentioned embedding usefulness;

Above-mentioned receiving element has been purchased, and extracts the device of this IP phone address information and uses this IP phone address information by clicking the device of making a phone call to information transmitter.

24. the digital voice communication system of voice being encoded and receiving and dispatching with predetermined voice coding modes is characterized in that, comprising:

End device, and the server unit that is connected to network and the speech data between end device is transmitted;

End device has been purchased, to importing the sound encoding device that voice are encoded, send the device of vocoded data, analyze the device of the speech data that has received, and the device that takes out code according to this analysis result from the specific part of the part of speech data;

Above-mentioned server unit has been purchased, be received in the data that exchange mutually between end device, and judge whether these data are devices of speech data, if the device that speech data is just analyzed these data, and according to this analysis result arbitrarily code be embedded into the specific part of a part and the device that sends;

Receive the end device of data by server unit, extract also output embedded code in this server unit.

25. the digital speech storage system of voice being encoded and store with predetermined voice coding modes is characterized in that, comprising:

The device of the speech data after analysis is encoded to the input voice,

The device that above-mentioned embedded data are stored as speech data;

Side by side go back storing additional information with the storage of common digital speech.

26. the digital speech storage system of voice being encoded and store with predetermined voice coding modes is characterized in that:

Code arbitrarily is embedded into the part of coded voice data and the device of storing;

When this speech data of storing of decoding, analyze the device of this voice data;

Extract the device of above-mentioned embedded code from the specific part of these storage data according to this analysis result.

27. the digital speech storage system of voice being encoded and store with predetermined voice coding modes is characterized in that:

The device of the speech data after analysis is encoded to the input voice;

According to this analysis result arbitrarily code be embedded into the device of specific part of the part of speech data;

The device that above-mentioned embedded data are stored as speech data;

When this speech data of storing of decoding, analyze the device of this speech data;

Extract the device of above-mentioned embedded code from the specific part of this speech data according to this analysis result.

28. digital speech storage system as claimed in claim 27 is characterized in that:

Above-mentioned embedding code is, speaker's customizing messages or storage time information;

The device of purchasing the voice data that use these information to carry out decompress(ion) being retrieved.