CN1964244B

CN1964244B - A method to receive and transmit digital signal using vocoder

Info

Publication number: CN1964244B
Application number: CN2005101177279A
Authority: CN
Inventors: 吴倩; 林伯瀚; 林�源; 范莉
Original assignee: XIAMEN ZHISHENG TECHNOLOGY Co Ltd
Current assignee: Beijing Hezhong Sizhuang Space-time Material Union Technology Co., Ltd.
Priority date: 2005-11-08
Filing date: 2005-11-08
Publication date: 2010-04-07
Anticipated expiration: 2025-11-08
Also published as: CN1964244A

Abstract

The disclosed method for transmitting-receiving digital signal by voice coder comprises: converting target signal into key speech character parameter by parameter mapping, synthesizing speech signal on the sending end; sending synthesized signal by GSM or CDMA voice coder; on receiving end, using speech analysis to extract key speech character parameter to recover original digital signal. This invention reduces transmission delay, and ensures interactive service quality.

Description

A kind of method with vocoder transmitting-receiving digital signal

Technical field

The present invention relates to communication technique field, particularly relate to a kind of method of receiving and dispatching the Any Digit signal and transmitting by voice channel with vocoder.

Background technology

Human voice signal is transmitted behind digital coding in modern telecom network.Because the bandwidth constraints of transmission channel and the factors such as quality index of voice communication, multiple different coding techniques coexists as in the modern telecom network.In fixing public telephone network, voice signal is often in the mode of waveform coding, adopt the coding techniques of pulse code modulation (pcm) or adaptive difference pulse code modulation (ADPCM), the code check with 64kbps (PCM) or 32kbps (ADPCM) behind digital coding transmits.Yet, be to realize higher compress speech rate, for example with Speech Signal Compression to the 16kbps code check, the waveform coding technology is powerless.In the mobile phone network, be subject to available channel bandwidth, voice signal makes full use of the model parameter and the pronunciation mechanism of human sound channel then in the mode of vocoder coding, under the prerequisite that guarantees certain vocal resonance quality, be compressed to the following transmission of 16kbps code check.Under all-key rate pattern in the GSM network, voice signal transmits with the 13kbps code check behind the RPE-LTP vocoder coding; The EVRC vocoder that uses in the phonetic vocoder of GSM enhancement mode all-key rate and the cdma network is all taked the technology based on ACELP, under the prerequisite that reduces speech quality hardly, the code check of Speech Signal Compression to 8-13kbps can be transmitted; And the CELP vocoder that U.S. Department of Defense (DoD) uses can still guarantee good speech quality with Speech Signal Compression to 4.8kbps.

Realized the high compression rate of voice signal is encoded though highly depend on the vocoder technology of source properties, the vocoder operation principle has determined for the compressed encoding of non-speech audio then powerless.As everyone knows, the modulation-demodulation technique by voice channel transmission Any Digit signal is widely used at the fixed telephone network that uses waveform coding mode (PCM or ADPCM).Usually, by some characteristic of change (modulation) sinusoidal continuous wave,, can represent the digital information code stream of variation as frequency, amplitude and phase place etc.The data modem unit of the current public fixed-line telephone network (POTS) that generally uses can reach the code check of 56Kbps.Yet, the signal that these data modem technology generate no longer has the characteristic of human speech, waveform characteristic such as amplitude, frequency and phase place etc. can't be saved after vocoder encoding and decoding effect, digital signal thereby can't transmit by the voice channel based on the mobile radio networks (as GSM, CDMA) of vocoder technology.

Although mobile radio networks (as GSM, CDMA) provides data channel (as CSD/HSCSD, GPRS/EDGE, UMTS etc.), but because postponing (0.5 second～2 seconds) and transmission shake etc., the high-transmission of data channel can't satisfy of the requirement of interactive live signal on the one hand to service quality solving basic transmission problem to digital signal; On the other hand, telecom operators provide the scope of data channel service to can not show a candle to voice service, and mode is different, so there are the difficulty of many interconnections in cross operator, across a network or transnational use data channel service.

Summary of the invention

The purpose of this invention is to provide a kind of method with vocoder transmitting-receiving digital signal.

For achieving the above object, the present invention takes following technical scheme: a kind of method with vocoder transmitting-receiving digital signal, it is characterized in that: the source digital signal that tendency to develop is defeated is converted to the crucial characteristics of speech sounds parameter of phonetic synthesis model with parameter mapping, handles generating voice signal by phonetic synthesis at transmitting terminal; Synthetic voice signal sends by the vocoder of GSM or CDMA; Handle the crucial characteristics of speech sounds parameter of extraction at receiving terminal by speech analysis, revert to original digital signal.

Above-mentioned method with vocoder transmitting-receiving digital signal, it specifically includes following steps: the source digital signal that (1) is sent tendency to develop divides frame to handle, each frame of digital signal is used for synthetic Short Time Speech signal, each frame is continued to be subdivided into the subframe that is uneven in length, and the quantity of described subframe is at least three; (2) with corresponding line spectral frequencies coefficient (LSP) index, broad sense excitation vector parameter reference and the broad sense excitation parameters gain index of generating of described subframe; (3) index value that generates in (2) step is tabled look-up in line spectral frequencies coefficient parameter list, broad sense excitation vector parameter list and broad sense excitation gain parameter list respectively generate line spectral frequencies coefficient parameter, broad sense excitation vector parameter and broad sense excitation gain parameter successively; (4) parameter that generates in (3) step is synthesized voice signal by the principle of CELP vocoder; (5) synthetic voice signal is sent by CDMA or GSM vocoder; (6) after receiving terminal receives synthetic voice signal, it is carried out speech analysis, extract line spectral frequencies coefficient parameter, broad sense excitation vector parameter and broad sense excitation gain parameter; (7) with the parameter that analyzes in (6) step at corresponding parameters table separately: table look-up in line spectral frequencies coefficient parameter list, excitation vector parameter list and the excitation gain parameter list reverse generation line spectral frequencies coefficient index, broad sense excitation parameters index and broad sense excitation parameters gain index; (8) with the reverse respectively subframe that is reduced to of index value that generates in (7) step, and subframe is reconfigured is a frame, is reduced to initial digital signal.In described (1) step, digital channel number flow point frame is handled, each frame of digital signal code stream is used to produce the Short Time Speech signal of 10-30 millisecond.In described (4) step, broad sense excitation vector parameter and broad sense excitation gain parameter are at first synthesized pumping signal by excitation signal generator, and line spectral frequencies coefficient parameter quantized back generation one linear predictor coefficient through inverse vector, the pumping signal that this linear predictor coefficient and excitation signal generator are synthesized is input to the linear prediction speech synthesis filter together and synthesizes voice signal at last.

Above-mentioned method with vocoder transmitting-receiving digital signal, specifically include following steps: the source digital signal that (1) is sent tendency to develop divides frame to handle, each frame of digital signal is used for synthetic Short Time Speech signal, with frame length is that a frame of N position continues to be subdivided into four subframes that are uneven in length, be respectively the code stream of X bit, Y bit, Z bit and G bit, form four subframes; (2) mapping of X bit stream generates line spectral frequencies coefficient parameter reference value, and the mapping of Y bit stream generates fundamental tone parameter reference value, and the mapping of Z bit stream generates excitation vector parameter reference value; The mapping of G bit stream generates excitation gain parameter reference value; (3) index value that generates in (2) step is tabled look-up in line spectral frequencies coefficient parameter list, fundamental tone parameter list, excitation vector parameter list and excitation gain parameter list respectively obtain real vectorial parameter: line spectral frequencies coefficient parameter, fundamental tone parameter, excitation vector parameter and excitation gain parameter; (4) parameter that generates in (3) step is synthesized voice signal by the principle of CELP vocoder; (5) synthetic voice signal is sent by CDMA or GSM vocoder; (6) after receiving terminal receives synthetic voice signal, it is carried out speech analysis, extract line spectral frequencies coefficient parameter, fundamental tone parameter, excitation vector parameter and excitation gain parameter; (7) parameter that extracts in (6) step is tabled look-up in line spectral frequencies coefficient parameter list, fundamental tone parameter list, excitation vector parameter list and the excitation gain parameter list of correspondence respectively reverse generation line spectral frequencies coefficient parameter reference, fundamental tone parameter reference, excitation vector parameter reference and excitation gain parameter reference; (8) with the reverse respectively subframe that is reduced to of index value that generates in (7) step, and subframe is reconfigured is a frame, is reduced to initial digital signal.In described (1) step, digital channel number flow point frame is handled, each frame of digital signal code stream is used to produce the Short Time Speech signal of 10-30 millisecond.In described (4) step, with the line spectral frequencies coefficient parameter quantification of corresponding X bit stream vector parameter through cutting apart vector quantization inverse operation and be converted to the linear predictor coefficient parameter, be used for the linear prediction speech synthesis filter; With the fundamental tone parameter vector of corresponding Y bit stream, generate the fundamental tone pumping signal through synthetic processing of fundamental tone; With the excitation vector parameter of corresponding Z bit code, and the excitation gain parameter of corresponding G bit stream, be input to the pumping signal synthesis module, generate pumping signal; This pumping signal and fundamental tone pumping signal act on the linear prediction speech synthesis filter of describing the sound channel characteristic, produce artificial synthetic voice signal.

The present invention is owing to take above design, and it has the following advantages:

1, the method for the present invention's proposition is in a kind of mode that has nothing to do with communication network exchange and transmission equipment, transmit the Any Digit signal of a constant bit rate pellucidly in high quality by the analog or digital voice channel, transmission delay and shake guarantee the service quality of interactive real time information transmitting-receiving far below the mode by data channel.

2, the present invention is owing to only need to use the voice service of operator, and interconnection is protected, and the scope of application is widened greatly, and the user can anyly have the place of voice service to transmit the Any Digit signal of a constant bit rate in the world with guaranteeing service quality.

3, the present invention can be applied to mobile radio terminal (GSM, CDMA mobile phone, satellite phone etc.), in landline telephone and the computer equipment, can realize multiple special and value-added service function: (1) improves the quality of voice transmission of " PTT (PTT:Push-to-Talk) " wireless cohort conversation value-added service, and make this service be no longer dependent on radio data channels, realize the independent operation of PTT service; (2) for realizing that by the mobile radio network voice channel secure voice and data communication provide the key technology support: because voice signal presents the randomness of height after the altitude figure encryption, do not had any characteristics of speech sounds, this technology will make the user in the world that has fixed telephone network (POTS) and GSM/CDMA mobile network to cover Anywhere with device, carry out and existing network exchange and irrelevant secure voice and the data communication of transmission equipment.

4, in (2), (3) step of the present invention, the digital signal code stream of each subframe is mapped as the relevant parameters index value and nonparametric itself, the flexibility of choosing the key parameter that is used for synthetic speech in advance is provided: the mutual difference of whole values space selected part in parameter is big, the parameter value that is easy to extract is included the relevant parameters code table in, corresponding to the index value that is come by the digital signal code stream mapping of subframe; Like this, be that cost has guaranteed that close supplied with digital signal produces the enough big simulation continuous wave voice signal of difference to reduce transmission code rate, the speech analysis processing that is beneficial to receiving terminal obtains correct result, effectively reduces the error rate.

Description of drawings

Fig. 1 is a structure block diagram of the present invention.

Fig. 2 is the structure block diagram of one embodiment of the present invention.

Embodiment

Vocoder is a kind of based on the high compression rate speech coding technology of human channel parameters model with pronunciation mechanism, be widely used in the network systems such as wireless mobile communications (GSM and CDMA), satellite communication, under the prerequisite that guarantees certain acoustical quality, realize the coding of voice signal is received and dispatched with low code check.Yet operation principle has determined vocoder can't realize efficient coding and transmitting-receiving to the signal that does not have characteristics of speech sounds.The present invention proposes a kind of technology that digital signal is received and dispatched by phonetic vocoder, need not to use data channel can realize the low time delay of Any Digit signal, few high-quality transmission of shaking.This technology can be applicable to wireless moving and reaches in the fixed communication terminal equipment, in a kind of mode that has nothing to do with network exchange and transmission equipment, transmits the Any Digit signal by the analog or digital voice channel.

As shown in Figure 1, be a kind of method provided by the present invention with vocoder transmitting-receiving digital signal, with reference to CELP vocoder principle, the source digital signal that tendency to develop is defeated is converted to the crucial characteristics of speech sounds parameter of phonetic synthesis model with parameter mapping, handles generating voice signal by phonetic synthesis at transmitting terminal; Synthetic voice signal can pass through GSM, CDMA and the transmission of other voice channel; Handle the crucial characteristics of speech sounds parameter of extraction at receiving terminal by speech analysis, recover raw digital signal, realize transmission and reception the Any Digit signal.

Specifically, this method includes following steps: the source digital signal that (1) is sent tendency to develop divides frame to handle, every frame is used to generate the Short Time Speech signal that length is the 10-30 millisecond, according to the parameter and the synthesis mechanism of phonetic synthesis model, continues a frame is subdivided into the subframe that is uneven in length; Because each subframe will produce the key parameter value of phonetic synthesis model with parameter mapping, so depending on, the quantity of subframe and length (is unit with the bit) is used for the list item number that model parameter kind that synthetic speech signal uses and each parameter list comprise, such as, line spectral frequencies coefficient (LSP), broad sense excitation parameters and the gain of broad sense excitation parameters, this three classes parameter is used always by various phonetic synthesis models based on the CELP technology, so the quantity of described subframe generally is at least three, with the above-mentioned three kinds of key parameters of correspondence; (2) described subframe realizes parameter maps in the mode of tabling look-up: promptly deposit the key parameter of some in parameter list in advance, with the corresponding respectively index value that becomes each parameter list of described subframe, as line spectral frequencies coefficient (LSP) parameter list index value, broad sense excitation vector table index value and broad sense excitation parameters gain table index value; (3) index value that generates in (2) step is tabled look-up in line spectral frequencies coefficient (LSP) parameter list, broad sense excitation vector table and broad sense excitation parameters gain table respectively generate line spectral frequencies coefficient (LSP) parameter, broad sense excitation vector parameter and broad sense excitation gain parameter successively; (4) parameter that generates in (3) step is synthesized voice signal according to the mechanism of CELP technology; (5) synthetic voice signal is sent by vocoder (as GSM or CDMA phonetic vocoder) or other voice channel; (6) after receiving terminal receives synthetic voice signal, it is carried out speech analysis, extract line spectral frequencies coefficient (LSP) parameter, broad sense excitation vector parameter and broad sense excitation gain parameter; (7) with the parameter that analyzes in (6) step at corresponding parameters table separately: table look-up in line spectral frequencies coefficient (LSP) parameter list, broad sense excitation vector parameter list and the broad sense excitation gain parameter list reverse generation line spectral frequencies coefficient (LSP) index, broad sense excitation parameters index and broad sense excitation parameters gain index; (8) with the reverse respectively subframe that is reduced to of index value that generates in (7) step, and subframe is reconfigured is a frame, is reduced to initial digital signal.

In (4) the above-mentioned step, broad sense excitation vector parameter and broad sense excitation gain parameter are at first synthesized pumping signal by excitation signal generator, and line spectral frequencies coefficient (LSP) parameter is quantized the back through inverse vector generate linear predictor coefficient, the pumping signal that this linear predictor coefficient and excitation signal generator are synthesized is input to linear prediction (LPC) speech synthesis filter together and synthesizes voice signal at last.Be different from common phonetic synthesis and handle, the entrained characterisitic parameter of outstanding this signal of statement is only paid attention in phonetic synthesis operation described herein, and signal itself needn't have any language meaning.

In addition, in (1) the above-mentioned step, why each frame of digital signal code stream is used to produce the Short Time Speech signal of 10-30 millisecond, mainly be to consider the complete statistics stability (requirement) that comprises the fundamental frequency information (requirement) of voice and guarantee voice signal less than 30 milliseconds greater than 10 milliseconds, to guarantee at receiving terminal: linear prediction filter can effectively be described the autocorrelation in short-term of signal, promptly effectively describes the channel model of sound pronunciation; And the pitch analysis filter correctly extracts the fundamental tone parameter;

Above-mentioned broad sense excitation parameters exists with two kinds of forms usually: one is the burst signal with pitch period characteristic, is used for synthetic voiced speech signal; Another kind is random signal (as the gaussian random signal etc.), is used for synthetic unvoiced speech signal; The gain of broad sense excitation parameters correspondingly comprises the gain parameter that is used for excitation of regulating impulse string signal and random signal excitation.In order to improve transmission code rate, can use the quantitative approach that increases subframe to reach the purpose that increases frame length (being unit) with bit, as fundamental frequency characterisitic parameter (comprising pitch delay parameter and the fundamental tone gain parameter of expressing fundamental frequency information) is shone upon as characteristics of speech sounds parameter independently, when being used for synthetic speech, can use more subframes quantity, correspondingly, excitation parameters can only comprise random signal (as the gaussian random signal etc.) excitation.Therefore, the present invention is in concrete enforcement, it is also conceivable that introducing fundamental frequency parameter (postponing and gain) is used for synthetic speech as pumping signal independently, like this, one frame can be subdivided into four subframes that are uneven in length, as shown in Figure 2, at transmitting terminal, frame length is that a frame of digital code stream of N position is divided into the code stream that length is respectively X bit, Y bit, Z bit and G bit, forms four subframes; The X bit stream generates line spectral frequencies coefficient parameter (LSP) index value by line spectral frequencies coefficient parameter (LSP) mapping; Mapping generates fundamental tone parameter reference value (pitch delay index and fundamental tone gain index) to the Y bit stream, the Z bit stream generates excitation vector parameter reference value by the excitation vector parameter maps by fundamental tone parameter (pitch delay and fundamental tone gain); The G bit stream generates excitation gain parameter reference value by the excitation gain parameter maps; Tabling look-up in corresponding line spectral frequencies coefficient parameter list, fundamental tone parameter list, excitation vector parameter list and excitation gain parameter list according to each index value obtains real vectorial parameter, i.e. line spectral frequencies coefficient parameter (LSP), fundamental tone parameter (pitch delay and fundamental tone gain), excitation vector parameter and excitation gain parameter;

Further, with the LSP of corresponding X bit stream quantize vectorial parameter through cutting apart vector quantization (Split VQ) inverse operation and be converted to linear prediction (LPC) coefficient parameter, be used for linear prediction (LPC) speech synthesis filter; With the fundamental tone parameter vector (pitch delay/gain) of corresponding Y bit stream, generate the fundamental tone pumping signal through synthetic processing of fundamental tone; With the excitation vector parameter of corresponding Z bit code, and the excitation gain parameter of corresponding G bit stream, be input to excitation signal generator, generate pumping signal; This pumping signal and fundamental tone pumping signal act on linear prediction (LPC) speech synthesis filter of describing the sound channel characteristic, produce artificial synthetic voice signal and transmit.

Length of field generally is taken as between 10 milliseconds to 30 milliseconds during this voice signal.As less than 10 milliseconds the time, can't complete recovery fundamental frequency information; And during greater than 30 milliseconds, the statistics stability of voice signal will no longer exist, thereby linear prediction model is no longer valid.Usually, each frame of digital signal can be used for the voice signal of 0 millisecond of Synthetic 2 (corresponding to ACELP, QCELP etc.) or 30 milliseconds (corresponding to FS1016 DoD CELP).When representing the length of synthetic speech signal with T, transmissible in theory digital signal code check R can be expressed as: R=(N/T*1000) bps.

The speech analysis that carries out at the receiving terminal of signal is treated to the contrary operation of above-mentioned phonetic synthesis, promptly analyzes received signal under the Minimum Mean Square Error meaning, extracts coefficient, excitation vector parameter, excitation gain parameter and the fundamental tone parameter of linear prediction filter.Particularly, input speech signal at first is input to linear prediction (LPC) analysis module, is sampling window with 20 milliseconds or 30 milliseconds (corresponding to settings of transmitting terminal), does auto-correlation computation, utilizes the Levinson-Durbin algorithm to obtain the coefficient of LPC filter; The coefficient of LPC filter is converted to the LSP coefficient of frequency domain through Chebyshev polynomials (Chebyshev Polynomial) computing, through cutting apart line spectral frequencies coefficient (LSP) parameter that vector quantization (Split VQ) algorithm obtains quantizing; Pitch analysis to input speech signal is finished by the pitch analysis module: the method for pitch analysis both can have been used the bigger closed loop search model (closed-loop) of operand, also can use the open loop search model (open-loop) of simplification.When using open loop search model (open-loop), the residual signals of input speech signal after linearity prediction (LPC) speech synthesis filter is handled burst into the fundamental tone predictive filter of pitch analysis module, generates the fundamental tone residual signals; Under this fundamental tone residual signals Minimum Mean Square Error meaning, calculate the optimum prediction value of two important parameters of fundamental tone predictive filter, i.e. pitch delay and fundamental tone gain; Pumping signal rule really obtains by the search coupling to excitation parameters table (codebook): voice signal and input speech signal that pumping signal (synthetic by excitation vector and excitation gain) is synthesized by linear prediction filter and fundamental tone composite filter form residual signals, under this residual signals Minimum Mean Square Error meaning, coupling obtains the optimal excitation signal, and this pumping signal can be by excitation vector and excitation gain parametric representation; And excitation vector and excitation gain parameter corresponding index value in parameter list separately is part source digital signal; Predict that through linearity the then corresponding parameter coding table separately of parameter that speech analysis and pitch analysis obtain obtains LSP parameter reference value and fundamental tone parameter reference value respectively before same.Described each index value draws in certain sequence after the subframe convergence processing, obtains the output digital code stream of every frame N position.

Specific embodiment:

(1) divide frame to the defeated source digital signal of tendency to develop, every frame length is 66 bits, is used for composition length and is 30 milliseconds voice signal; Every frame continues to be subdivided into four subframes: subframe 1 length is 16 bits, and subframe 2 length are 24 bits, and subframe 3 length are 16 bits, and subframe 4 length are 10 bits; Each subframe will produce the key parameter value of phonetic synthesis model respectively with parameter mapping.(2) length is that the subframe 1 of 16 bits contains line spectral frequencies coefficient (LSP) parameter list of 65536 list items as index value retrieval one, and every list item is line spectral frequencies coefficient (LSP) quantization vector of one 34 bit; Length is the subframe 2 of 24 bits, its high 14 pitch delay parameter tables that contain 16384 list items as index value retrieval one, every list item is the gene delay parameter of one 28 bit, its low 10 fundamental tone gain parameter tables that then contain 1024 list items as index value retrieval one, every list item is the gene gain parameter of one 20 bit; Length is that the subframe 3 of 16 bits contains the excitation vector parameter list of 65536 list items for index value retrieval one, and every list item is the excitation vector parameter of one 36 bit; Length is that the subframe 4 of 10 bits contains the excitation gain parameter list of 1024 list items as index value retrieval one, and every list item is the excitation gain parameter of one 20 bit; (3) with the index value that generates in (2) step respectively at line spectral frequencies coefficient (LSP) parameter list, fundamental tone parameter list, tabling look-up in excitation vector table and the excitation parameters gain table generates line spectral frequencies coefficient (LSP) parameter, fundamental tone parameter (postponing and gain), excitation vector parameter and excitation gain parameter; (4) parameter that generates in (3) step is synthesized voice signal according to the mechanism of CELP technology: the pumping signal that excitation vector forms after the excitation gain parameter regulation and fundamental tone parameter vector (pitch delay/gain) are burst into linear prediction (LPC) phonetic synthesis filter unit through the synthetic fundamental tone pumping signal that generates of handling of fundamental tone, and the coefficient parameter of described linear prediction (LPC) filter is obtained through inverse vector quantification conversion by line spectral frequencies coefficient (LSP) quantization vector; (5) synthetic voice signal is sent by vocoder (as GSM or CDMA phonetic vocoder); (6) after receiving terminal receives synthetic voice signal, it is carried out speech analysis, extract line spectral frequencies coefficient (LSP) parameter, fundamental tone parameter, excitation vector parameter and excitation gain parameter: at first, input speech signal is imported into linear prediction (LPC) analysis module, with 30 milliseconds (corresponding to settings of transmitting terminal) is sampling window, do auto-correlation computation, utilize the Levinson-Durbin algorithm to obtain the coefficient of LPC filter; The coefficient of this LPC filter is converted to the LSP coefficient of frequency domain through Chebyshev polynomials (Chebyshev Polynomial) computing, and through cutting apart line spectral frequencies coefficient (LSP) parameter that vector quantization (Split VQ) algorithm obtains quantizing, length is 34 bits; Pitch analysis to input speech signal is finished by the pitch analysis module, the method of pitch analysis is used open loop search model (open-loop): the residual signals of input speech signal after linearity prediction (LPC) filter process burst into the fundamental tone predictive filter of pitch analysis module, generates the fundamental tone residual signals; Under this fundamental tone residual signals Minimum Mean Square Error meaning, calculate the optimum prediction value of two important parameters of fundamental tone predictive filter: the fundamental tone gain of the pitch delay of 28 bits and 20 bits; Pumping signal rule really obtains by the search coupling to excitation parameters table (codebook): voice signal and input speech signal that pumping signal (synthetic by excitation vector and excitation gain) is synthesized by linear prediction filter and fundamental tone composite filter form residual signals, under this residual signals Minimum Mean Square Error meaning, coupling obtains the optimal excitation signal, and this pumping signal can be with the excitation gain parametric representation of the excitation vector and 20 bits of 36 bits; (7) with the parameter that extracts in (6) step respectively at the corresponding parameters table: line spectral frequencies coefficient (LSP) parameter list (the quantification LSP parameter list item that contains 65536 34 bits), fundamental tone parameter list (the pitch delay parameter list item that contains 16384 28 bits, and the fundamental tone gain parameter list item of 1024 20 bits), reverse generation line spectral frequencies coefficient (LSP) parameter reference of tabling look-up in excitation vector table (the excitation vector parameter list item that contains 65536 36 bits) and the excitation parameters gain table (the excitation gain parameter list item that contains 1025 20 bits), the fundamental tone parameter reference, excitation vector parameter reference and excitation gain parameter reference; (8) with the reverse respectively subframe that is reduced to of index value that generates in (7) step, and subframe is reconfigured is a frame, is reduced to a frame of source digital signal, and length is 66 bits.Therefore accessible transmission code rate is R=66/30*1000=2200 bps in this embodiment.

The present invention is owing to take above design, and it has following characteristics:

1, the method for the present invention's proposition is in a kind of mode that has nothing to do with communication network exchange and transmission equipment, transmit in high quality pellucidly the Any Digit signal of a constant bit rate by the analog or digital voice channel, transmission delay and shake guarantee the service quality of interactive real time information transmitting-receiving far below the mode by data channel.

2, the present invention is owing to only need to use the voice service of operator, and interconnection is protected, and the scope of application is widened greatly, and the user is can be in the world any to have the place of voice service to transmit the Any Digit signal of a constant bit rate with guaranteeing service quality.

3, the present invention can be applied to mobile radio terminal (GSM, CDMA mobile phone, satellite phone etc.), in landline telephone and the computer equipment, can realize multiple special and value-added service function: (1) improves the quality of voice transmission of " PTT (PTT:Push-to-Talk) " wireless cohort conversation value-added service, and make this service be no longer dependent on radio data channels, realize the independent operation of PTT service; (2) for realizing that by the mobile radio network voice channel secure voice and data communication provide the key technology support: because voice signal presents the randomness of height after the altitude figure encryption, do not had any characteristics of speech sounds, this technology will make the user in the world that has fixed telephone network (POTS) and GSM/CDMA mobile network to cover Anywhere with device, carry out the secure voice and the data communication that have nothing to do with existing network exchange and transmission equipment.

4, in (2), (3) step of the present invention, the data signal code stream of each subframe is mapped as corresponding parameter reference value and nonparametric itself, the flexibility of choosing in advance for the synthesis of the key parameter of voice is provided: the mutual difference of selected part is big in whole values space of parameter, the parameter value that is easy to extract is included corresponding parameter code table in, corresponding to the index value that is come by the data signal code stream mapping of subframe; Like this, distinguish enough big simulation continuous wave voice signal to reduce transmission code rate as cost has guaranteed close supplied with digital signal generation, the speech analysis processing that is beneficial to receiving terminal obtains correct result, effectively reduces the bit error rate.

Claims

1. method with vocoder transmitting-receiving digital signal is characterized in that: the source digital signal that tendency to develop is defeated is converted to the crucial characteristics of speech sounds parameter of phonetic synthesis model with parameter mapping, handles generating voice signal by phonetic synthesis at transmitting terminal; Synthetic voice signal sends by the vocoder of GSM or CDMA; Handle the crucial characteristics of speech sounds parameter of extraction at receiving terminal by speech analysis, revert to original digital signal.

2. method according to claim 1, it is characterized in that: described method with vocoder transmitting-receiving digital signal, it specifically includes following steps: the source digital signal that (1) is sent tendency to develop divides frame to handle, each frame of digital signal is used for synthetic Short Time Speech signal, each frame is continued to be subdivided into the subframe that is uneven in length, and the quantity of described subframe is at least three; (2) with corresponding line spectral frequencies coefficient LSP index, broad sense excitation vector parameter reference and the broad sense excitation parameters gain index of generating of described subframe; (3) index value that generates in (2) step is tabled look-up in line spectral frequencies coefficient parameter list, broad sense excitation vector parameter list and broad sense excitation gain parameter list respectively generate line spectral frequencies coefficient parameter, broad sense excitation vector parameter and broad sense excitation gain parameter successively; (4) parameter that generates in (3) step is synthesized voice signal by the principle of CELP vocoder; (5) synthetic voice signal is sent by CDMA or GSM vocoder; (6) after receiving terminal receives synthetic voice signal, it is carried out speech analysis, extract line spectral frequencies coefficient parameter, broad sense excitation vector parameter and broad sense excitation gain parameter; (7) with the parameter that analyzes in (6) step at corresponding parameters table separately: table look-up in line spectral frequencies coefficient parameter list, excitation vector parameter list and the excitation gain parameter list reverse generation line spectral frequencies coefficient index, broad sense excitation parameters index and broad sense excitation parameters gain index; (8) with the reverse respectively subframe that is reduced to of index value that generates in (7) step, and subframe is reconfigured is a frame, is reduced to initial digital signal.

3. method according to claim 2 is characterized in that: in described (1) step, digital channel number flow point frame is handled, each frame of digital signal code stream is used to produce the Short Time Speech signal of 10-30 millisecond.

4. according to claim 2 or 3 described methods, it is characterized in that: in described (4) step, broad sense excitation vector parameter and broad sense excitation gain parameter are at first synthesized pumping signal by excitation signal generator, and line spectral frequencies coefficient parameter quantized back generation one linear predictor coefficient through inverse vector, the pumping signal that this linear predictor coefficient and excitation signal generator are synthesized is input to the linear prediction speech synthesis filter together and synthesizes voice signal at last.

5. method according to claim 1, it is characterized in that: described method with vocoder transmitting-receiving digital signal, it specifically includes following steps: the source digital signal that (1) is sent tendency to develop divides frame to handle, each frame of digital signal is used for synthetic Short Time Speech signal, with frame length is that a frame of N position continues to be subdivided into four subframes that are uneven in length, be respectively the code stream of X bit, Y bit, Z bit and G bit, form four subframes; (2) mapping of X bit stream generates line spectral frequencies coefficient parameter reference value, and the mapping of Y bit stream generates fundamental tone parameter reference value, and the mapping of Z bit stream generates excitation vector parameter reference value; The mapping of G bit stream generates excitation gain parameter reference value; (3) index value that generates in (2) step is tabled look-up in line spectral frequencies coefficient parameter list, fundamental tone parameter list, excitation vector parameter list and excitation gain parameter list respectively obtain real vectorial parameter: line spectral frequencies coefficient parameter, fundamental tone parameter, excitation vector parameter and excitation gain parameter; (4) parameter that generates in (3) step is synthesized voice signal by the principle of CELP vocoder; (5) synthetic voice signal is sent by CDMA or GSM vocoder; (6) after receiving terminal receives synthetic voice signal, it is carried out speech analysis, extract line spectral frequencies coefficient parameter, fundamental tone parameter, excitation vector parameter and excitation gain parameter; (7) parameter that extracts in (6) step is tabled look-up in line spectral frequencies coefficient parameter list, fundamental tone parameter list, excitation vector parameter list and the excitation gain parameter list of correspondence respectively reverse generation line spectral frequencies coefficient parameter reference, fundamental tone parameter reference, excitation vector parameter reference and excitation gain parameter reference; (8) with the reverse respectively subframe that is reduced to of index value that generates in (7) step, and subframe is reconfigured is a frame, is reduced to initial digital signal.

6. method according to claim 5 is characterized in that: in described (1) step, digital channel number flow point frame is handled, each frame of digital signal code stream is used to produce the Short Time Speech signal of 10-30 millisecond.

7. according to claim 5 or 6 described methods, it is characterized in that: in described (4) step, with the line spectral frequencies coefficient parameter quantification of corresponding X bit stream vector parameter through cutting apart vector quantization inverse operation and be converted to the linear predictor coefficient parameter, be used for the linear prediction speech synthesis filter; With the fundamental tone parameter vector of corresponding Y bit stream, generate the fundamental tone pumping signal through synthetic processing of fundamental tone; With the excitation vector parameter of corresponding Z bit code, and the excitation gain parameter of corresponding G bit stream, be input to the pumping signal synthesis module, generate pumping signal; This pumping signal and fundamental tone pumping signal act on the linear prediction speech synthesis filter of describing the sound channel characteristic, produce artificial synthetic voice signal.