CN1901442A

CN1901442A - Camouflage communication method based on voice identification

Info

Publication number: CN1901442A
Application number: CNA2006100855931A
Authority: CN
Inventors: 邓宗元; 杨震
Original assignee: Nanjing Post and Telecommunication University
Current assignee: Nanjing Post and Telecommunication University
Priority date: 2006-06-26
Filing date: 2006-06-26
Publication date: 2007-01-24
Anticipated expiration: 2026-06-26
Also published as: CN100550723C

Abstract

This invention relates to a disguise communication method based on phone identification used in military secrecy communicatioin and facsimile apparatus system including: generating secret information code stream based on phone identification, serial and parallel transformation and generating being inserted cryptographs based on random cryptographic keys, testing cleartext sonant frame formant and insertion of random cryptograph information and secret information extraction in the following steps: 1, generation of water print and insertion, 2, extraction of water print.

Description

Camouflage communication method based on speech recognition

Technical field

The present invention relates to a kind of real-time secret signalling scheme that is used for military security communication and facsimile printer system, belong to Information Hiding Techniques and voice process technology field based on speech recognition.

Background technology

Information Hiding Techniques has had development at full speed as a branch of information security field over past ten years.This technology is carrier signal not to be produced under the prerequisite of undue influence, extra information is embedded in the Digital Media, to realize functions such as copyright protection, covert communications.The research focus of Information Hiding Techniques mainly contains following two aspects: the one, and the digital audio watermark that research has the transparency, fail safe and robustness; This achievement can be used for copyright protection, data and adds, prevents to distort and the occasions such as AM automatic monitoring of wireless transmission.The 2nd, study and how to utilize information disguising to realize a kind of brand-new speech secure communication mode.For this reason, relate to the voice camouflage core algorithm of research innovation, the voice compression algorithm of extremely low code check, and technology such as identity code authentication.Its target is to realize non real-time and real-time two class voice manipulative communications deceptions.This achievement can be used for systems such as military speech secure communication.

Along with development of computer, the conventional cryptography algorithm is subjected to great challenge, and studying new digitize voice secret signalling becomes and have a challenging problem.At present real time information is hidden and the research of voice camouflage fewer, to be the amount of information hidden in real time than watermark compare its main cause wants big many.The hiding scheme of existing real time information is subject to the contradiction between the amount of hiding Info and the transparency and the robustness mostly: a kind of is that least significant bit is replaced (LSB) method, this method can embed more data, algorithm is simple, is easy to realize, but relatively poor for the robustness of the interference in the communication line.Another kind of method commonly used is to realize Information hiding at transform domain by DCT (discrete cosine transform) coefficient of revising the plaintext carrier.The embedding robustness of this kind scheme is stronger, but the embedding data volume is many not as LSB (least significant bit) scheme.Therefore, adopt the voice compression algorithm of extremely low code check, reduce the ciphertext code check as far as possible, just might seek the real-time reliable communication that the stronger hiding scheme of robustness realizes secret information.

Summary of the invention

Technical problem: the objective of the invention is to propose a kind of camouflage communication method that can be used in the secure communication based on speech recognition, can realize the transparency, robustness, the real-time of Information hiding, the performance index such as self-reparability of receiving terminal, has good practical value, for the research of secure communication and design provide a new approach.

Technical scheme: the camouflage communication method based on speech recognition of the present invention comprises that generation, the random key based on the secret information code stream of speech recognition generates, the unvoiced frame formant embeds cipher-text information, secret information extracts four big parts, and its entire method job step is as follows:

1.) watermark generates and embeds:

A. the user of service of system sends the phrase command that needs secret transmission to system

B. system at first carries out speech recognition based on DTW (dynamic time convolution), is corresponding literal with command conversion, if find mistake, revises; And literal table is shown as binary code stream,

C. according to formula:

S^{'} = s^{'} (i), 0 \leq i < \frac{M}{2}, s^{'} (i) &Element; {0,1,2,3}

Carry out binary system and quaternary string and conversion, wherein, s ' is a quaternary code fluxion value (i), and it makes up in twos from binary code stream numerical value and obtains, and conversion back forms the code stream string S ' of quaternary number, and M is the bit number that order needs during with binary coding.And according to formula:

S＝(s′(k)+K ₁)mod(4)，s′(k)∈{0，1，2，3}，0≤k＜M/2

Carry out encryption, obtain to encrypt ciphertext S, K ₁Be predetermined accidental enciphering seed, be even number, s ' is a quaternary unencryption code stream numerical value (k);

D. expressly voice divide frame, and each frame carries out voiceless sound and voiced sound judgement according to energy, finds out unvoiced frame,

E. unvoiced frame is DFT (discrete fourier transform), and selects first or second formant according to controlling elements K2,

F. according to formula:

The watermark that the c step is generated embeds, wherein C _kBe the coefficient that obtains behind the raw tone unvoiced frame DFT (discrete fourier transform), C ' _kBe the coefficient after the embedding ciphertext, β is an insert depth, the scale factor that coefficient amplitude changes before and after promptly embedding, be determined by experiment, the n of 4n+s ' in (k) makes minimum value when the inequality on the right side is just set up in the following formula, and s ' is the quaternary ciphertext code stream numerical value of encrypting (k)

The expression round numbers.

G. the plaintext after watermarked carries out obtaining mixing voice against DFT (discrete fourier transform) and communicates;

2.) watermark extracting:

H. at first the mixing voice that receives is carried out the branch frame equally,

I. each frame carries out voiceless sound and voiced sound judgement according to energy equally, finds out unvoiced frame,

J. unvoiced frame is DFT (discrete fourier transform), and selects first or second formant according to same controlling elements K2,

K. according to formula:

s^{''} (i) = (round [\frac{C_{k}^{'}}{β}]) \mod 4, (0 \leq i, k < M / 2)

Extract secret information, C ' _kBe the coefficient that embeds after the ciphertext, β is an insert depth, M be order with binary-coded bit number, during quaternary representation, the Command field bit number of coding just equals M/2, s " (i) be the encryption ciphertext quaternary numerical value that extracts, round[] the expression round

L. according to formula:

S^{''} = (s^{''} (i) + K_{1}) \mod (4), 0 \leq i < \frac{M}{2}, s^{'} (i) &Element; {0,1,2,3},

K is the even number deciphering, S " be the quaternary ciphertext code stream string after the deciphering that extracts, K ₁Be identical predetermined accidental enciphering seed with transmit leg, be even number, other variable is the same.

M. according to formula:

s(i)＝S″，0≤i＜M，s(i)∈{0，1}

Carry out the quaternary to binary conversion, obtain real ciphertext binary numeral s  (i), it is corresponding one by one with literal, and other variable implication is the same.

N. literal is presented on the screen.

The training of larger data amount is carried out in generation based on the secret information code stream of speech recognition in advance at the speaker, pass on the people of secret information quietly transmitting an order by the microphone of terminal in the environment, possible minor error is revised in advance by keyboard through DTW (dynamic time convolution) speech recognition system identification back, secret subsequently voice messaging S is through being encoded into secret code stream.

The random key generating portion produces the key that above-mentioned voice identification result is upset, and final generation ciphertext to be hidden; At first carry out quaternary serial to parallel conversion, obtain new ciphertext code stream S ', generate key K then at random for recognition result sequence S ₁S ' is encrypted upset, produce and wait to hide ciphertext, wherein K ₁Be even number.

The unvoiced frame formant embeds the cipher-text information stage at first to expressly carrying out voice divides frame, carrying out the clear/voiced sound of frame then judges, the unvoiced frame that accounts for about 70% is searched for first and second formants, according to people's ear masking effect, the first or second formant place of adaptive selection unvoiced frame i.e. second random key K ₂Control the coefficient of pairing Frequency point and make amendment, if K ₂=0, select the first formant place frequency, otherwise select the second formant place frequency information of carrying out to embed; According to the three dB bandwidth theory, search out in DFT (discrete fourier transform) coefficient coefficient, and make amendment to realize hiding of cipher-text information near the first or second formant position, be about to wait hide ciphertext S ' embedding coefficient C _kIn; Replace the back voice of having hidden ciphertext are expressly carried out IDFT (contrary discrete fourier transform), obtain mixing voice, in PSTN (public users telephone network) channel, transmit.

Secret information extracts and at first mixing voice and the built-in end that receives is carried out the branch frame according to same frame length, carry out voicing decision then, unvoiced frame is carried out N point DFT (discrete fourier transform), find out the Frequency point of the every frame voiced sound first or the second formant correspondence, searching method is consistent when embedding; Find out the back and it is handled extract secret information, again by the key upset to being decrypted, carry out the speech code stream that parallel serial conversion obtains original transmission at last, on the receiving terminal screen, obtain the secret information that transmit leg transmits.

Beneficial effect:

1, speech recognition technology is introduced the Information hiding field as extremely low code check compression scheme, greatly compressed the code check of secret voice messaging, hide scheme for the real time information of realization transparency, robustness, high safety and created precondition.

2, the existing information concealing technology mainly concentrates on the digital watermarking aspect at present, and this programme realizes that jumbo real time information is hiding, in fields such as military security communications very high practical value is arranged.

3, the hiding scheme of existing audio-frequency information based on DFT (discrete fourier transform) adopts fixed intermediate frequency to embed mostly, fixed-site, and fail safe is relatively poor.And this programme is encrypted the fail safe that the two-stage key guarantees secret information before adopting the self adaptation frequency to select and embed.And make full use of human hearing characteristic (HAS), the capacity that adopts the multi-system modulation technique to hide Info with raising.

Description of drawings

Fig. 1 is a system block diagram of the present invention,

Fig. 2 be raw tone (on) with take cipher sound (descending) time domain waveform,

Fig. 3 be raw tone (on) with take cipher sound (descending) sound spectrograph,

Fig. 4 is the performance chart of system through low-pass filtering,

Fig. 5 is that white Gaussian noise disturbs the performance map of system down.

Embodiment:

For the system that carries out secure communication, it is vital that secret information is delivered to the destination like clockwork, and the form of the information of transmission is less important.This programme proposes with the method for speech recognition secret voice to be handled, and the code check of secret voice is reduced greatly, provides the high as far as possible embedding scheme of the transparency and robustness, realizes real-time secure communication.From information-theoretical angle, the reason that speech recognition why can compression bit rate is not only to have forgiven semantic information, the tone of also forgiving the speaker, intonation, characteristic informations such as emotion in the voice; In military security communication, these speakers' feature all is ' redundant ' with respect to semantic commands, the scheme that adopts speech recognition with secret voice change into the order literal again coding transmission can reduce the code check of secret information greatly.Through measuring and calculating, adopt this scheme to carry out ciphertext compression after, the ciphertext code check can be controlled within the 100bit/s, is the present traditional voice compression coding scheme code check that is beyond one's reach.According to present speech recognition technology level, can reasonably suppose: in the military security communication system, the order of transmission can be the limited vocabulary amount, and in this case, speech recognition can reach very high accuracy rate.

Be the transparency and the robustness that guarantees to hide Info after embedding, adopt at frequency domain and realize the Information hiding scheme by adaptive embedding point selection and the modulation of multi-system code element.Usually embedding in transform domain hides Info is fixed intermediate frequency position in DCT (discrete fourier transform) territory, fixed-site, and fail safe is relatively poor.The main distinction of this programme and traditional frequency domain Information hiding scheme is: (1) embedded location is unfixing, and the selection that embeds point can produce in adaptive search, is equivalent to key.(2) for each selected embedding point, can transmit multiple code element state (as four condition) after revising a frequency coefficient, realize the modulation of multi-system information, increase the bit rate that embedding hides Info.(3) utilize the apperceive characteristic of people's ear, the voice after the embedding are owing to transmit in PSTN (public users telephone network) channel, and the various possible interference (companding, low-pass filtering, white noise etc.) that channel is existed has very strong robustness.

(words) encodes with the semanteme after the speech recognition, obtains secret information code stream S, is hidden among the open voice V, in PSTN (public users telephone network) channel.For satisfying transparent requirement, make secret information disperse as far as possible, plaintext V is carried out the processing of branch frame, voicing decision, select unvoiced frame expressly to carry out the embedding of ciphertext.According to the auditory masking effect of people's ear, the Frequency point masking effect that spectrum energy is big more is strong more, can introduce bigger noise and is not discovered by pleasant.For unvoiced frame, the spectrum energy at first and second formant place of frequency spectrum is local maximum, and the Frequency point of therefore selecting the open voice unvoiced frame first or the second formant place correspondence is (by key K ₂Control) revise the embedding that its coefficient carries out secret information.Owing to introducing bigger distortion at these frequency places modification coefficients and not discovered,, revise a coefficient and can transmit a plurality of states 2 for fully increasing the bit rate that embeds ciphertext by people's ear ^N(as N=2), and just transmit two states unlike coefficient of traditional scheme modifying, when making full use of masking effect, guaranteeing the transparency, improved secret information and embedded efficient.Carry out voicing decision at receiving terminal according to identical strategy, amended coefficient is extracted in the formant search, and the information that judgement is hidden is decoded into semanteme at last, is presented on the terminal screen.The whole system framework as shown in Figure 1.

A. the generation of secret information code stream

The present invention adopts the compression algorithm of speech recognition technology as extremely low code check, improves the camouflage efficient of voice dazzle system.In view of little vocabulary speech recognition systems such as military security communications, adopt DTW (dynamic timewarping dynamic time convolution) to carry out little vocabulary speech recognition.DTW (dynamic time convolution) scheme is the speech recognition schemes of comparative maturity, in little vocabulary speech recognition systems such as military security communication, higher recognition success rate is arranged.Designed system of the present invention is carried out the training of larger data amount in advance at the speaker, pass on the people of secret information quietly transmitting an order by the microphone of terminal in the environment (as secret bunker), possible minor error is revised in advance by keyboard through DTW (dynamic time convolution) speech recognition system identification back, secret subsequently voice messaging S is embedded among the plaintext V through being encoded into secret code stream.Can suppose that the speaker sends secret order with such form: the military operation (for example shifting) of certain army (for example)+preposition (for example to)+place name (for example Nanjing)+take.Like this, in conjunction with semantic pause, adopt DTW (dynamic time convolution) technology that very high discrimination is arranged, and certain practical value is arranged.

By test, the speech recognition schemes that this paper adopts still has very high discrimination having under the situation of certain noise.And this paper designed system has been utilized PSTN (public users telephone network) wire message way when communication, can resist very strong electronic jamming under war environment.

B. secret information telescopiny

(1) the real-time secret voice of system acquisition are carried out be encoded into after the speech recognition M bit ciphertext code stream S:

S＝s(i)，0≤i＜M，s(i)∈{0，1} (1)

Wherein, s (i) is a binary code stream numerical value.

(2) determine to revise the bit number that coefficient transmitted, native system can multi-system be modulated cipher-text information, is the modification that example is carried out coefficient with the quaternary.

Carry out serial to parallel conversion for S, obtain new ciphertext code stream S ':

S^{'} = s^{'} (i), 0 \leq i < \frac{M}{2}, s^{'} (i) &Element; {0,1,2,3} - - - (2)

Wherein, s ' is the numerical value that binary code stream in (1) formula is changed into quaternary code stream (i), and S ' is a unencryption quaternary ciphertext code stream, and M is the bit number of the coded command of binary representation after the speech recognition.

Generate key K at random ₁(K ₁Be even number) S ' is encrypted upset, specific algorithm is:

S^{'} = (s^{'} (i) + K_{1}) \mod (4), 0 \leq i < \frac{M}{2}, s^{'} (i) &Element; {0,1,2,3} - - - (3)

Wherein, the quaternary ciphertext code stream of S ' for having encrypted, other each variable implication is the same.

Like this, even algorithm is open, the person of stealing secret information also just obtains encrypted code stream at most and can't obtain effective information.

(3) divide frame for disclosed plaintext voice (8kHz sampling), carry out voicing decision, find out satisfactory unvoiced frame (V _k, frame length is L) and carry out Information hiding.For selected frame V _kMake the DFT (discrete fourier transform) that N is ordered, obtain

F＝DFT(V _k)＝{f _k(i)，0≤i≤N} (4)

F wherein _k(i) expression is used for i DFT (discrete fourier transform) coefficient of the k frame of Information hiding, and F is a transformation results.If discrete fourier transform points N＞L (unvoiced frame voice number of samples), the back mends 0 when making DFT (discrete fourier transform).

(4) determine the embedded location of every frame and revise coefficient to hide Info.Selecting suitable frequency to embed is a very important problem, according to people's ear masking effect, and the first or second (K of formant place of adaptive selection unvoiced frame ₂Control) coefficient of pairing Frequency point is made amendment.If K ₂=0, select the first formant place frequency, otherwise select the second formant place frequency information of carrying out to embed.According to the three dB bandwidth theory, search out in DFT (discrete fourier transform) coefficient near the coefficient of the first or second formant position and the hiding of making amendment with the realization cipher-text information.Ciphertext S ' to be hidden after encrypting is embedded expressly conversion coefficient C _kIn, in order to carry out the blind Detecting of secret information at receiving terminal, embedding grammar is: carry out the quantification that insert depth is β, coefficient is after quantizing

Wherein Expression rounds up, and insert depth β is determined by experiment.After the embedding information

C wherein _kBe the coefficient that obtains behind the raw tone unvoiced frame DFT (discrete fourier transform), C ' _kBe the coefficient after the embedding ciphertext, β is an insert depth, the scale factor that coefficient amplitude changes before and after promptly embedding, be determined by experiment, the n of 4n+s ' in (k) makes minimum value when the inequality on the right side is just set up in the following formula, and s ' is the quaternary ciphertext code stream numerical value of encrypting (k)

The expression round numbers, other variable implication is the same.

Because first or second formant frequency of voiced sound concentrates on medium and low frequency (200-1000Hz), being embedded in information in this scope can avoid high fdrequency component to cause the loss of information in filtering or quantizing process, because the position difference of first and second formant of each frame voice, the selection of therefore hiding frequency is adaptive, this is equivalent to add the one-level key, has further strengthened the fail safe of secret information.In addition, because the spectrum component at first and second formant place of voiced sound is big many of other frequency place spectrum components relatively, when satisfying the transparency, can realize the multi-system modulation and guarantee robustness, as long as select suitable insert depth β by experiment, just may command embeds the influence to the transparency.

(5) will embed coefficient C after the ciphertext _k' replacement original plaintext voice coefficient C _k, and the transformation results F that revises carried out IDFT (contrary discrete fourier transform), and obtain mixing voice V ', in PSTN (public users telephone network) channel, transmit.

C. secret information leaching process

This programme can carry out the blind Detecting of cipher-text information at receiving terminal." carry out the branch frame with built-in end according to same frame length, carry out voicing decision then, unvoiced frame is carried out N point DFT (discrete fourier transform), obtain corresponding F ' according to Fig. 1, at first with the mixing voice V that receives.Because it is fixing that frequency domain embeds point, thus also need find out the Frequency point of the every frame voiced sound first or the second formant correspondence, consistent when searching method and embedding.The local maximum norm value of DFT (discrete fourier transform) coefficient that can prove the mixing voice V ' after the embedding information is still at the respective frequencies place of V, promptly

\underset{i}{\arg \max} ({| f}_{k} (i) |) = \underset{i}{\arg \max} (f_{k}^{'} (i) |),

I ∈ 3dBwidth, the searching method when therefore embedding is suitable equally in testing process.Find out C _k' back is carried out following processing to it and is extracted secret information:

S^{''} = s^{''} (i) = (round [\frac{C_{k}^{'}}{β}]) \mod 4, (0 \leq i, k < M / 2) - - - (6)

Round[wherein] the expression round, C ' _kBe to receive the coefficient that voice carry out discrete fourier transform after embedding ciphertext, s " (i) is the quaternary ciphertext code stream numerical value of the encryption that extracts; S " be the quaternary ciphertext code stream of the encryption that extracts, M is the bit number of the coded command of binary representation after the speech recognition.

By key to S " be decrypted, obtain:

S^{''} = (s^{''} (i) + K_{1}) \mod (4), 0 \leq i < \frac{M}{2}, s^{'} (i) &Element; {0,1,2,3} - - - (7)

K ₁Being the encryption seed identical with transmit leg, being even number, S " is the quaternary ciphertext code stream of having deciphered that extracts." carrying out parallel serial conversion obtains with S

S＝s(i)，0≤i＜M，s(i)∈{0，1} (8)

S  (i) is the binary system ciphertext code stream numerical value that extracts, and is corresponding one by one with literal, and S  is the binary system ciphertext code stream that extracts, and can directly translate or be shown as literal, and M is the bit number of the coded command of binary representation after the speech recognition

Decode at last and on the receiving terminal screen, obtain the secret information that transmit leg transmits.

Claims

1. camouflage communication method based on speech recognition, it is characterized in that this method comprise generation based on the secret information code stream of speech recognition, serial to parallel conversion and based on random key generate ciphertext to be embedded, expressly the unvoiced frame formant detect and at random embedded technology, the secret information of cipher-text information extract four big parts, its entire method job step is as follows:

1.) watermark generates and embeds:

A. the user of service of system sends the phrase command that needs secret transmission to system,

C. according to formula:

S^{'} = s^{'} (i), 0 \leq i < \frac{M}{2}, s^{'} (i) &Element; {0,1,2,3}

Carry out binary system and quaternary string and conversion, wherein, s ' is to change binary code stream into behind the quaternary code stream numerical value (i), and S ' is a unencryption quaternary ciphertext code stream, and M is the bit number of the coded command of binary representation after the speech recognition;

And according to formula:

S^{'} = (s^{'} (i) + K_{1}) \mod (4), 0 \leq i < \frac{M}{2}, s^{'} (i) &Element; {0,1,2,3}

Carry out encryption, obtain ciphertext, wherein, the quaternary ciphertext code stream of S ' for having encrypted, K ₁Being even number, is key, and other each variable implication is the same,

E. unvoiced frame is done the DFT discrete fourier transform, and selects first or second formant according to controlling elements K2,

F. according to formula:

The watermark that the c step is generated embeds, wherein C _kBe the coefficient that obtains behind the raw tone unvoiced frame DFT (discrete fourier transform), C ' _kBe the coefficient after the embedding ciphertext, β is an insert depth, the scale factor that coefficient amplitude changes before and after promptly embedding, be determined by experiment, the n of 4n+s ' in (k) makes minimum value when the inequality on the right side is just set up in the following formula, and s ' is the quaternary ciphertext code stream numerical value of encrypting (k) The expression round numbers,

Other variable implication is the same;

G. the plaintext after watermarked carries out obtaining mixing voice against the DFT conversion and communicates;

2.) watermark extracting:

J. unvoiced frame is DFT, and selects first or second formant according to same controlling elements,

K. according to formula:

S^{''} = s^{''} (i) = (round [\frac{C_{k}^{'}}{β}]) \mod 4, (0 \leq i, k < M / 2)

Extract secret information, wherein round[] the expression round, C ' _kBe to receive the coefficient that voice carry out discrete fourier transform after embedding ciphertext, s " (i) is the quaternary ciphertext code stream numerical value of the encryption that extracts; S " be the quaternary ciphertext code stream of the encryption that extracts, M is the bit number of the coded command of binary representation after the speech recognition;

1. according to formula:

S^{''} = (s^{''} (i) + K_{1}) \mod (4), 0 \leq i < \frac{M}{2}, s^{''} (i) &Element; {0,1,2,3},

K is an even number

Deciphering, K ₁Being the encryption seed identical with transmit leg, being even number, S, " be the quaternary ciphertext code stream of having deciphered that extracts, other variable implication is the same.

M. according to formula:

S＝s(i)，0≤i＜M，s(i)∈{0，1}

Carry out the quaternary to binary conversion, obtain real ciphertext literal, s  (i) is the binary system ciphertext code stream numerical value that extracts, corresponding one by one with literal, S  is the binary system ciphertext code stream that extracts, can directly translate or be shown as literal, M is the bit number of the coded command of binary representation after the speech recognition

N. literal is presented on the screen;

2. the camouflage communication method based on speech recognition according to claim 1, it is characterized in that carrying out the training of larger data amount in advance at the speaker based on the generation of the secret information code stream of speech recognition, pass on the people of secret information quietly transmitting an order by the microphone of terminal in the environment, possible minor error is revised in advance by keyboard through DTW dynamic time convolution speech recognition system identification back, secret subsequently voice messaging S is through being encoded into secret code stream.

3. the camouflage communication method based on speech recognition according to claim 1 is characterized in that serial to parallel conversion and generates the key that ciphertext part to be embedded upsets above-mentioned voice identification result based on random key, and final generation ciphertext to be hidden; At first carry out quaternary serial to parallel conversion, obtain new ciphertext code stream S ', generate key K then at random for recognition result sequence S ₁S ' is encrypted upset, produce and wait to hide ciphertext, wherein K ₁Be even number.

4. the camouflage communication method based on speech recognition according to claim 1, it is characterized in that unvoiced frame formant expressly detects and at random embedding stage of cipher-text information at first to expressly carrying out voice divides frame, carrying out the clear/voiced sound of frame then judges, the unvoiced frame that accounts for about 70% is searched for first and second formants, according to people's ear masking effect, the first or second formant place of adaptive selection unvoiced frame i.e. second random key K ₂Control the coefficient of pairing Frequency point and make amendment, if K ₂=0, select the first formant place frequency, otherwise select the second formant place frequency information of carrying out to embed; According to the three dB bandwidth theory, search out in the DFT coefficient coefficient, and make amendment to realize hiding of cipher-text information near the first or second formant position, be about to S ' embedding C _kIn; Replace the back voice of having hidden ciphertext are expressly carried out the contrary discrete fourier transform of IDFT, obtain mixing voice, in PSTN public users telephone network channel, transmit.

5. the camouflage communication method based on speech recognition according to claim 1, it is characterized in that secret information extracts at first carries out the branch frame with mixing voice and the built-in end that receives according to same frame length, carry out voicing decision then, unvoiced frame is carried out N point DFT, find out the Frequency point of the every frame voiced sound first or the second formant correspondence, searching method is consistent when embedding; Find out the back and it is handled extract secret information, again by the key upset to being decrypted, carry out the speech code stream that parallel serial conversion obtains original transmission at last, on the receiving terminal screen, obtain the secret information that transmit leg transmits.