CN1275223C - A low bit-rate speech coder - Google Patents

A low bit-rate speech coder Download PDF

Info

Publication number
CN1275223C
CN1275223C CNB2004101032203A CN200410103220A CN1275223C CN 1275223 C CN1275223 C CN 1275223C CN B2004101032203 A CNB2004101032203 A CN B2004101032203A CN 200410103220 A CN200410103220 A CN 200410103220A CN 1275223 C CN1275223 C CN 1275223C
Authority
CN
China
Prior art keywords
dimensional vector
speech
bits
voice
pattern
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2004101032203A
Other languages
Chinese (zh)
Other versions
CN1632862A (en
Inventor
董恩清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CNB2004101032203A priority Critical patent/CN1275223C/en
Publication of CN1632862A publication Critical patent/CN1632862A/en
Application granted granted Critical
Publication of CN1275223C publication Critical patent/CN1275223C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The present invention discloses a speech coder in a communication system suitable for the requirement of low bit variable speed speech coding. An SVM method is applied to VAD speech activation detection, and the correct recognition rate of the speech coder to speech detection can be enhanced. The original four kinds of speech patterns can be combined into three kinds of speech patterns by adopting a classification method of the speech pattern of GSM, and finally, only two bits can be adopted to denote the whole speech pattern. The present invention also fully utilizes local cosine transform, and has the characteristic of high coding gain. The low bit variable speed speech coding can be carried out by adopting LCT and SVM-VAD. The present invention provides the low bit variable speed speech coder which is practical and has good performance.

Description

A kind of low bit variable rate speech coding device
Technical field
The present invention relates to a kind of speech coder, the speech coder in particularly a kind of communication system that is suitable for the low bit variable rate speech coding of requirement.
Background technology
Variable bit rate (Variable Bit Rate, VBR) coding techniques is a development in recent years, its core concept be to voice transition, steadily, unvoiced segments adopts different speed to encode, so that VBR coding mean speed will be more much lower than the FBR coding of equal voice quality.
The application that really can bring into play the VBR technical advantage better is that speech encoding rate is not had strict rate limit and requires to have the occasion of speed " elasticity ", as CDMA, VoIP, ATM etc.At present, wireless communication system and IP technology will soon occupy more and more important position just in fast development in Global Communications System.For this reason, the ITU-T SG16 of International Telecommunication Association is formulating new variable rate encoding standard, uses to adapt to packetized voice communication net (as VoIP) in the future, IMT-2000 voice coding and high-quality low bit rate compress speech.In these are used, user's considerations of can between voice quality and code rate (channel capacity), trade off, realization has the ability that " soft " controlled.
A well-known example of variable bit rate is QCELP, and it is a variable bit rate speech coder that is called IS-95 of being formulated by CTIA.Up to the present, more relatively based on the research of the variable bit rate voice coding method of CELP.
In voice activation detects, adopt the example of well-known VAD method to be included in to adopt in QCELP speech coder in the IS-95 standard, the EVRC in the IS-127 standard, the GSM standard DTX pattern and the VAD method of the G.729 accessories B (G.729B) that proposes by ITU-T.
In the past few years, presented support vector machine (Support Vector Machines, SVM) keen interest.Experience shows that SVM generally has preferable performance in as extensive application such as handwritten form identification, facial recognition, text classifications.But the application of this method in voice activation detects seldom reported.
Low bit rate speech coding became a main research theme in the past in 20 years, and the result causes with a lot of speech coding algorithm standardization of bit rate from 16kb/s to the 2.4kb/s scope.At present the speech coder research emphasis is at 4kb/s and lower high-quality speech coding, and nearest studies show that carried out voice coding at frequency domain and had the more potentiality of good quality than already present based on celp coder.The characteristics of spectral encoding device are to attempt reconstruct voice spectral amplitude rather than accurately recover speech waveform.Although above scrambler based on CELP and parameter coding is widely used in low bit rate speech coding, the restriction of the model accuracy that they are supposed mostly also has them mainly to depend on correct parameter estimation, and often these requirements are difficult to be guaranteed.So the robustness of these coding methods is very poor under particular surroundings, the voice quality after causing encoding has certain limitation.
By (1992) such as Coifman and Meyer (1991) and Auscher successively the local cosine base of structure by smoothly, tightly supporting the clock function and the cosine function product constitutes.The cosine function of these localization is still keeping orthogonality, and has less Heisenberg product.In the last few years, the Local Cosine Transform theory obtained research extensively and profoundly, and the research of this method aspect the voice signal processing is less relatively, particularly was applied in the voice coding still less.But the coding gain that has proved LCT method in voice coding in the article that Malvar H.S. delivered in nineteen ninety is better than the DCT coding, and very near the KL transition coding.Particularly compare, obviously reduced " snap " sound between the frame with the DCT coding method.
In view of the tight demand of low bit variable rate speech coding method in practical application at present, and some other was based upon the coding method on the model basis because the restriction of the parameters precision of model accuracy of being supposed and estimation often influences the range of application of coding effect and scrambler in the past.
Summary of the invention
The objective of the invention is to utilize Local Cosine Transform to have the characteristics of higher coding gain, a low bit variable rate speech coding device practicality, function admirable is provided.
For achieving the above object, the technical solution used in the present invention is: a kind of low bit variable rate speech coding device, it is based on Local Cosine Transform, after the primary speech signal process Hi-pass filter pre-service of described speech coder with input, be input to the voice activation detecting device and detect voice activated frame of differentiation and non-voice activated frame, handle through the LCT transducer respectively again, finish voice coding, wherein:
Described voice activation detecting device adopts SVM-VAD voice activation detection module, and its workflow is as follows:
1. the speech data to input carries out parameter extraction, obtains the line spectral frequencies (Line SpectralFrequencies) of present frame, is with energy, low strap energy, four characteristic of division parameters of zero-crossing rate entirely;
2. initialization process: calculate at any time according to the change of ground unrest and to be updated in above-mentioned four characteristic parameters when having only ground unrest;
3. difference processing: represent when four characteristic parameters of above-mentioned present frame are deducted initialization respectively that current state has only corresponding these four characteristic parameters under the situation of ground unrest, generate voice activation and detect corresponding four difference characteristic parameters that classification needs;
4. adopt the SVM algorithm to carry out the voice activation detection, what the training support vector machine adopted is that (voice are divided into and activate and two kinds of sound-types of non-activation the minimum optimization of sequence the most at last for Sequential Minimal Optimization, SMO) method;
5. adopt level and smooth and correcting algorithm of four steps to carry out VAD differentiation smoothing processing;
6. carry out exporting non-activation or voice activated frame signal after VAD handles at each frame, if the ground unrest energy of estimating this frame greater than the ground unrest energy threshold, then needs again to handle revising the average background noise parameter;
Described LCT transducer is handled, and its method is:
1. be non-voice activated frame to detecting through SVM-VAD voice activation detection module, branch n dimensional vector n dimension by noiseless/ground unrest speech pattern carries out the processing of branch dimension, then with this minute n dimensional vector n carry out the branch n dimensional vector n according to the code book of corresponding minute n dimensional vector n of noiseless/ground unrest speech pattern respectively and quantize, obtaining with corresponding two bit length of this speech pattern all is 7 branch n dimensional vector n quantized result, simultaneously scalar quantization is carried out in the gain of this pattern speech frame, will be according to 2 bits of expression speech pattern, 8 bits of expression gain, what represent first fen n dimensional vector n and second fen n dimensional vector n all is the order of 7 bits, form 3 byte outputs, represent that this frame voice coding finishes;
2. be voice activated frame to detecting through the SVM-VAD module, by voiceless sound (pattern 0), pure and impure sound (pattern 1), in the method for strong voiced sound (pattern 2) be divided into three kinds of speech patterns, branch n dimensional vector n dimension according to corresponding three kinds of speech patterns carries out the processing of branch dimension, then corresponding four branch n dimensional vector ns are carried out the branch n dimensional vector n according to the code book of corresponding minute n dimensional vector n of the speech pattern of correspondence respectively and quantize, obtain representing respectively the quantized result of corresponding branch n dimensional vector n with corresponding four the different length bits of this speech pattern; Simultaneously scalar quantization is carried out in the gain of this speech frame, to these bits be formed the output of integral words joint according to 8 bits of two bits of expression speech pattern, expression gain and according to order, represent that this frame voice coding finishes from the bit of four fens n dimensional vector ns of bit to the of first fen n dimensional vector n representing this speech pattern.
Described noiseless/first fen n dimensional vector n dimension of ground unrest speech pattern, second fen n dimensional vector n dimension be 40; Described voiceless sound, pure and impure sound and in first fen n dimensional vector n dimension, second fen n dimensional vector n dimension and the 3rd fen n dimensional vector n dimension of strong voiced speech pattern be 40, and the 4th fen n dimensional vector n dimension is 20.
Described noiseless/ground unrest speech pattern first, second minute n dimensional vector n Bit Allocation in Discrete is 7 bits, the n dimensional vector n Bit Allocation in Discrete was 0 bit in the 3rd, the 4th minute, gain is that 8 bits, pattern are 2 bits; Described unvoiced speech pattern first, second minute n dimensional vector n Bit Allocation in Discrete is 7 bits, and the n dimensional vector n Bit Allocation in Discrete was 8 bits in the 3rd, the 4th minute, and gain is that 8 bits, pattern are 2 bits; Described pure and impure sound speech pattern first, second minute n dimensional vector n Bit Allocation in Discrete is 11 bits, and the n dimensional vector n Bit Allocation in Discrete was 8 bits in the 3rd, the 4th minute, and gain is that 8 bits, pattern are 2 bits; Strong voiced speech pattern first, second minute n dimensional vector n Bit Allocation in Discrete is 8 bits in described, and the n dimensional vector n Bit Allocation in Discrete was 8 and 6 bits in the 3rd, the 4th minute, and gain is that 8 bits, pattern are 2 bits.
The present invention is applied to SVM to have improved the correct recognition rata of speech coder to speech detection in the VAD detection owing to made full use of the SVM method; Adopt the speech pattern sorting technique of GSM, and four kinds of original speech patterns are merged into three kinds of speech patterns, make and finally only adopt two bits to represent whole speech pattern.
Description of drawings
Fig. 1 is the SVM-VAD voice activation module work operational flow diagram that the embodiment of the invention provides
Fig. 2 is the framed structure synoptic diagram of the VBR-LCT speech coder that provides of the embodiment of the invention
Embodiment
Below in conjunction with drawings and Examples the present invention is further described:
Embodiment:
1, voice activated mode division
The criterion that speech pattern is selected in gsm system is as follows:
Mode=0, P v<1.7 (voicelesss sound).
Mode=1, P v〉=1.7, P m<3.5 for all m, (light voiced sound).
Mode=2,3.5≤P m<7.0, for all m, (middle voiced sound).
Mode=3, P m>7.0, for all m, (strong voiced sound).
M=1 wherein, the subframe in a certain frame of 2,3,4 expressions, wherein P mRepresent m subframe open loop LTP prediction gain (dB), P vExpression entire frame open loop LPT prediction gain (dB).
Strong voiced sound and middle voiced sound have stronger periodicity and higher speech energy, and according to speech production model, the resonance peak of these two kinds of speech patterns is very strong, represent that well they help producing voiced sound more clearly.For frequency field coding, the spectrum composition difference between strong voiced sound and the middle voiced sound is little, so, in an embodiment of the present invention, adopt strong voiced sound pattern and middle voiced sound pattern merge into one call in the method for strong voiced sound pattern.Another reason of merging in one strong voiced sound pattern is because the detected silent frame type of VAD adds top three kinds of speech patterns, can only utilize 2 bits to represent conversion between the coding mode.Therefore, present embodiment has only three kinds of patterns for voice activated, i.e. pattern 0, pattern 1, pattern 2, represent respectively voiceless sound pattern, light voiced sound pattern and in strong voiced sound pattern.
2, divide the n dimensional vector n quantization method
Say that roughly preceding four resonance peaks of adult's voice signal lay respectively at 500Hz, 1500Hz, 2500Hz and 3500Hz.In fact this be divided into voice signal four important areas, requires the spectrum in these four zones is treated with a certain discrimination when coding.So, the method that the embodiment of the invention takes the branch dimension to quantize the coefficient of Local Cosine Transform when the design scrambler.Adopt the vector quantization method (LGB algorithm) that proposed by Linde, Buzo and Gray three people in 1980 to carry out the code book training respectively for each n dimensional vector n.When utilizing LGB algorithm generated code postscript, the search speed of code book adopts the tree code book search method when improving encoding and decoding.
In the branch dimension that the embodiment of the invention adopts quantizes, divide for the Local Cosine Transform number of coefficients of each pattern of voice activated frame and from the low frequency to the high frequency, to be respectively 40,40,40,20.And only get the coefficient of preceding two low-frequency ranges for noiseless or background noise frames, be respectively 40.These four vectors are called first n dimensional vector n, second n dimensional vector n, third dimension vector and fourth dimension vector.Because to sampling rate is the voice signal of 8kHz, only keep the voice signal that the following spectrum composition of 3500Hz just is enough to recover preferably satisfactory quality.In order to reduce computation complexity, the fourth dimension vector of voice activated model frame is only used 20 coefficients, and noiseless or background noise frames is not utilized the coefficient of high half frequency range, and table 1 is the branch n dimensional vector n dimension of various pattern speech frames.During inverse transformation synthetic speech signal in demoder, the height half of 20 coefficients of the radio-frequency component of the residue of voice activated frame and noiseless (ground unrest) 80 coefficients in frequently are filled to 0.
3, bit distributes
Characteristics according to all kinds of voice activated frames and noiseless (ground unrest) frame are taked the different bit rates allocation strategy, and table 2 is Bit Allocation in Discrete tables of the VBR-LCT scrambler that provides of the embodiment of the invention.
In strong voiced sound pattern voice have stronger periodicity, and speech energy focuses mostly in the medium and low frequency band, so need distribute more bit to the medium and low frequency band.Distribute medium number of bits just can obtain expression preferably this type of speech pattern.
For the pattern voice of light voiced sound because it be to a certain extent voiced sound and voiceless sound with a certain proportion of mixing, strong voiced sound pattern voice was strong during its periodicity had, but was wherein comprising the transition part in the voice.Though the shared ratio in voice of sudden change frame wherein is less, can it have comprised a large amount of information, will directly influence voice quality so represent it effectively.For this reason, present embodiment adopts the strategy that distributes higher number of bits to the speech frame of this pattern.
Voiceless sound pattern voice can be described as fully by voiceless sound to be formed, so should think that it is smooth that the Local Cosine Transform of voiceless sound is composed.In Bit Allocation in Discrete, distribute identical bit on each frequency band substantially, respectively increase a bit but only give two frequency ranges on high half frequency for the voiceless sound characteristic that strengthens HFS.
In order to obtain naturalness voice preferably, in the present embodiment, adopt voice all to fill 0 processing with noiseless or background noise frames.Will cause the energy between sound frame and the silent frame to produce sudden change if carry out such processing, form uncomfortable phenomenon.For this reason, also distribute certain bit to come it is represented to noiseless or noise frame.For strong background noise or under special environment, if be noiseless, utilize this limited bit also can on some degree, represent the information of speech sound so with sound erroneous judgement, this is based on the peculiar advantage of Local Cosine Transform coding method.
The gain of the speech frame scrambler of each pattern be the code vector of search when adopting the input signal spectrum energy with coding spectrum energy and ratio calculate and get.8 bit mark quantization methods are adopted in the quantification of gain.Total bit number that the speech frame of various patterns is distributed all is the integral words joint, so for the coding of each pattern speech frame, the bit error that occurs frame inside in the transmission can not cause the decoding of subsequent voice frame to have certain anti-error code and error correcting capability.
4, SVM-VAD method
The effect of VAD be sound and noiseless between distinguish, this is a well-known classification problem.For any classification problem, have to select the parameter that is used to classify, and have to design a discriminant function.What we chose is one group of parameter of describing signal energy and spectrum composition that ordinary practice adopted during VAD used.The selection of parameter is contribution, its robustness and its computation complexity of classification results are arranged by each parameter.Here the parameter of Xuan Zeing is four difference measurement parameters that present frame parameter and ground unrest running mean parameter difference obtain, promptly spectrum distortion, be with energy difference, low strap energy difference, zero passage rate variance entirely.
Vad algorithm and non-voice activated scrambler are all with the formal operations of digitize voice frame.For compatibility, all methods are used the frame length that equates.As Fig. 1 is for operational flow diagram the vad algorithm of each frame.It is local using the SVM method to carry out the VAD discrimination result, and just it does not consider the smooth performance in short-term of voice and noise.Need to use the front consecutive frame, adopt level and smooth and correcting algorithm of four steps.If the noise level flip-flop uses least energy to estimate at one in over a long time, design a special replacement algorithm and prevent that algorithm has been locked in sound pattern.
The framed structure synoptic diagram of the VBR-LCT speech coder that Fig. 2 provides for the embodiment of the invention.Pretreatment module is to handle for the high-pass filtering that reduces low-frequency noise and DC component among Fig. 2.The speech coder input speech signal is that sampling rate is the voice signal of 8kHz 16 bit PCM forms.What adopt in the present embodiment is the speech data of wav form, so level magnitude has been normalization.
Signal is carried out transform analysis adopt the method for handling in short-term usually.The length of short signal section is chosen the analysis result influence bigger.The voice signal transform coding method relates to the problem that analysis window length is selected equally.We know that voice signal generally speaking is the signal of weak non-stationary, but it can think stably at short notice approx, as the interval of 20ms.So,, in coding, select long window to reduce bit rate as far as possible, but meanwhile also can increase the delay of codec in order to improve ratio of compression.So, on frame length is chosen,, require the delay and the bit rate of scrambler are carried out compromise according to the characteristics of voice signal.The low bit variable rate coding device that the embodiment of the invention provided requires the frame length can not be less than 20ms, moreover the 20ms frame length is that most scrambler adopts, the low coding strategy that postpones in belonging to.Voice segments in frame length is 20ms, voice signal can be considered to stably approx, helps the orthogonal representation of voice signal, so frame length is selected 20ms for use in the present embodiment, i.e. 160 sampled points.
The evaluation of scrambler:
1, objective evaluation
The VBR-LCT speech coder that table 3 column data provides for present embodiment with G.729B, the result that compares of GSMHalf-Rate, FS1016 and FS1015 coding standard.This result has also illustrated the reliability of method for objectively evaluating in speech coder performance evaluation.G.729B, GSM Half-Rate and FS1016 all belong in the coding standard of low bit rate, the voice quality of their codings is considerably beyond FS1015, VBR-LCT method, but from these two indexs, comparatively speaking the VBR-LCT method has sizable advantage.Compare with the FS1015 scrambler of close bit rate, show that from the SNR and the PSNR of several type voice data the VBR-LCT coding method that present embodiment provides obviously exceeds nearly 5dB at most than the SNR and the PSNR of FS1015 standard.
From the essence analysis of speech coder, the VBR-LCT coding method that the present invention adopts is carried out in transform domain, its essence is the category of waveform coding.So utilize two evaluation indexes of SNR and PSNR to carry out objective evaluation, be favourable to it, objective indicator can be used as a reference to scrambler evaluation.
2, subjective assessment
The final accepting object of voice that speech coder produces is people's a ear, so the voice quality quality behind the coding mainly is acceptor's sense of hearing perception evaluation.Here we adopt unofficial voice hearing test to carry out the evaluation of voice quality.
Encode at the voice to bidirectional dialog, the scrambler mean bit rate of the VBR-LCT that present embodiment of the present invention provides is near 1.6kb/s.For the clear voice of noiseless, the reconstruct voice that the VBR-LCT scrambler obtains have slight bluring too, so it is loud and clear to can't hear the voice that resemble LPC-10e reconstruct.The speech intelligibility height that do not have G.729B, GSM Half-Rate and FS1016 coding standard produces, but its property understood and naturalness are good, and obvious better than the LPC-10e method of close bit rate.The VBR-LCT coding method has stronger robustness to neighbourhood noise, and its coding distortion is along with the change of signal is insensitive, even to G.729B, signal that GSM Half-Rate, FS1016 and LPC-10e method are invalid is still very stable.When using background music or other non-speech audio, the VBR-LCT coding method is obviously better than LPC-10e method.These are because the VBR-LCT coding method belongs to the waveform coding in transform domain, so it does not rely on as speech characteristic parameters such as fundamental tones fully.
Table 1
Speech pattern Divide n dimensional vector n
First n dimensional vector n Second n dimensional vector n Third dimension vector The fourth dimension vector
Noiseless/ground unrest Mode 0 (voiceless sound) Mode 1 (light voiced sound) Mode 2 (in strong voiced sound) 40 40 40 40 40 40 40 40 0 40 40 40 0 20 20 20
Table 2
Speech pattern Divide n dimensional vector n Gain Pattern Bit/frame
First n dimensional vector n Second n dimensional vector n Third dimension vector The fourth dimension vector
Noiseless/ground unrest Mode 0 (voiceless sound) Mode 1 (light voiced sound) Mode 2 (in strong voiced sound 7 7 11 8 7 7 11 8 0 8 8 8 0 8 8 6 8 8 8 8 2 2 2 2 24 40 48 40
Table 3
Encoder type SNR(dB) PSNR(dB) Bit rate (kb/s)
G.729Anne×B GSM Half-Rate FS1016 FS1015(LPC-10e) VBR-LCT -0.95 1.24 0.71 -3.59 -0.96 15.08 14.81 16.74 12.47 15.08 8 5.6 4.8 2.4 1.6

Claims (3)

1. one kind low bit variable rate speech coding device, after the primary speech signal process Hi-pass filter pre-service with input, be input to the voice activation detecting device and detect voice activated frame of differentiation and non-voice activated frame, handle through the Local Cosine Transform device respectively again, finish voice coding, it is characterized in that:
Described voice activation detecting device adopts support vector machine-voice activation detection module, and its workflow is as follows:
1. the speech data to input carries out parameter extraction, obtains the line spectral frequencies of present frame, is with energy, low strap energy, four characteristic of division parameters of zero-crossing rate entirely;
2. initialization process: calculate at any time according to the change of ground unrest and to be updated in above-mentioned four characteristic parameters when having only ground unrest;
3. difference processing: represent when four characteristic parameters of above-mentioned present frame are deducted initialization respectively that current state has only corresponding these four characteristic parameters under the situation of ground unrest, generate voice activation and detect corresponding four difference characteristic parameters that classification needs;
4. adopt algorithm of support vector machine to carry out the voice activation detection, what the training support vector machine adopted is the minimum optimization method of sequence, and voice are divided into and activate and two kinds of sound-types of non-activation the most at last;
5. adopt level and smooth and correcting algorithm of four steps to carry out voice activation and detect the differentiation smoothing processing;
6. after each frame carries out voice activation and detect to handle, export non-activation or voice activated frame signal, if to the ground unrest energy of estimating this frame greater than the ground unrest energy threshold, then need to revise again again the average background noise parameter and handle;
Described Local Cosine Transform device is handled, and its method is:
1. be non-voice activated frame to detecting through support vector machine-voice activation detection module, branch n dimensional vector n dimension by noiseless/ground unrest speech pattern carries out the processing of branch dimension, then with this minute n dimensional vector n carry out the branch n dimensional vector n according to the code book of corresponding minute n dimensional vector n of noiseless/ground unrest speech pattern respectively and quantize, obtaining with corresponding two bit length of this speech pattern all is 7 branch n dimensional vector n quantized result, simultaneously scalar quantization is carried out in the gain of this pattern speech frame, will be according to 2 bits of expression speech pattern, 8 bits of expression gain, what represent first fen n dimensional vector n and second fen n dimensional vector n all is the order of 7 bits, form 3 byte outputs, represent that this frame voice coding finishes;
2. be voice activated frame to detecting through support vector machine-voice activation detection module, by voiceless sound, pure and impure sound, in the method for strong voiced sound be divided into three kinds of speech patterns, branch n dimensional vector n dimension according to corresponding three kinds of speech patterns carries out the processing of branch dimension, then corresponding four branch n dimensional vector ns are carried out the branch n dimensional vector n according to the code book of corresponding minute n dimensional vector n of the speech pattern of correspondence respectively and quantize, obtain representing respectively the quantized result of corresponding branch n dimensional vector n with corresponding four the different length bits of this speech pattern; Simultaneously scalar quantization is carried out in the gain of this speech frame, to these bits be formed the output of integral words joint according to 8 bits of two bits of expression speech pattern, expression gain and according to order, represent that this frame voice coding finishes from the bit of four fens n dimensional vector ns of bit to the of first fen n dimensional vector n representing this speech pattern.
2. low bit variable rate speech coding device according to claim 1 is characterized in that: described noiseless/first fen n dimensional vector n dimension of ground unrest speech pattern, second fen n dimensional vector n dimension be 40, the three, the 4th fens n dimensional vector n dimensions and be 0; Described voiceless sound, pure and impure sound and in first fen n dimensional vector n dimension, second fen n dimensional vector n dimension and the 3rd fen n dimensional vector n dimension of strong voiced speech pattern be 40, the four fens n dimensional vector n dimensions and be 20.
3. low bit variable rate speech coding device according to claim 1, it is characterized in that: described noiseless/ground unrest speech pattern first, second minute n dimensional vector n Bit Allocation in Discrete is 7 bits, three, the 4th fen n dimensional vector n Bit Allocation in Discrete is 0 bit, and gain is that 8 bits, pattern are 2 bits; Described unvoiced speech pattern first, second minute n dimensional vector n Bit Allocation in Discrete is 7 bits, and the n dimensional vector n Bit Allocation in Discrete was 8 bits in the 3rd, the 4th minute, and gain is that 8 bits, pattern are 2 bits; Described pure and impure sound speech pattern first, second minute n dimensional vector n Bit Allocation in Discrete is 11 bits, and the n dimensional vector n Bit Allocation in Discrete was 8 bits in the 3rd, the 4th minute, and gain is that 8 bits, pattern are 2 bits; Strong voiced speech pattern first, second minute n dimensional vector n Bit Allocation in Discrete is 8 bits in described, and the n dimensional vector n Bit Allocation in Discrete was 8 and 6 bits in the 3rd, the 4th minute, and gain is that 8 bits, pattern are 2 bits.
CNB2004101032203A 2004-12-31 2004-12-31 A low bit-rate speech coder Expired - Fee Related CN1275223C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2004101032203A CN1275223C (en) 2004-12-31 2004-12-31 A low bit-rate speech coder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2004101032203A CN1275223C (en) 2004-12-31 2004-12-31 A low bit-rate speech coder

Publications (2)

Publication Number Publication Date
CN1632862A CN1632862A (en) 2005-06-29
CN1275223C true CN1275223C (en) 2006-09-13

Family

ID=34848171

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2004101032203A Expired - Fee Related CN1275223C (en) 2004-12-31 2004-12-31 A low bit-rate speech coder

Country Status (1)

Country Link
CN (1) CN1275223C (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101320563B (en) * 2007-06-05 2012-06-27 华为技术有限公司 Background noise encoding/decoding device, method and communication equipment
CN101359978B (en) * 2007-07-30 2014-01-29 向为 Method for control of rate variant multi-mode wideband encoding rate
CN102044242B (en) 2009-10-15 2012-01-25 华为技术有限公司 Method, device and electronic equipment for voice activation detection
CN108281151A (en) * 2018-01-25 2018-07-13 中国电子科技集团公司第五十八研究所 Multi tate acoustic code device based on TMS320F28335

Also Published As

Publication number Publication date
CN1632862A (en) 2005-06-29

Similar Documents

Publication Publication Date Title
US7472059B2 (en) Method and apparatus for robust speech classification
KR100883656B1 (en) Method and apparatus for discriminating audio signal, and method and apparatus for encoding/decoding audio signal using it
Ramırez et al. Efficient voice activity detection algorithms using long-term speech information
AU763409B2 (en) Complex signal activity detection for improved speech/noise classification of an audio signal
KR100962681B1 (en) Classification of audio signals
KR100964402B1 (en) Method and Apparatus for determining encoding mode of audio signal, and method and appartus for encoding/decoding audio signal using it
CN1244907C (en) High frequency intensifier coding for bandwidth expansion speech coder and decoder
CN1168071C (en) Method and apparatus for selecting encoding rate in variable rate vocoder
CN1266674C (en) Closed-loop multimode mixed-domain linear prediction (MDLP) speech coder
CN1241169C (en) Low bit-rate coding of unvoiced segments of speech
WO2008067719A1 (en) Sound activity detecting method and sound activity detecting device
CN1335980A (en) Wide band speech synthesis by means of a mapping matrix
US20120303362A1 (en) Noise-robust speech coding mode classification
CN101061535A (en) Method and device for the artificial extension of the bandwidth of speech signals
WO2002029782A1 (en) Perceptual harmonic cepstral coefficients as the front-end for speech recognition
JP2007523388A (en) ENCODER, DEVICE WITH ENCODER, SYSTEM WITH ENCODER, METHOD FOR ENCODING AUDIO SIGNAL, MODULE, AND COMPUTER PROGRAM PRODUCT
KR20020052191A (en) Variable bit-rate celp coding of speech with phonetic classification
WO2008148321A1 (en) An encoding or decoding apparatus and method for background noise, and a communication device using the same
JP2019514065A (en) Audio encoder for encoding audio signal in consideration of detected peak spectral region in higher frequency band, method for encoding audio signal, and computer program
EP1312075A1 (en) Method for noise robust classification in speech coding
Górriz et al. An effective cluster-based model for robust speech detection and speech recognition in noisy environments
CN1275223C (en) A low bit-rate speech coder
Chen et al. Robust voice activity detection algorithm based on the perceptual wavelet packet transform
Asgari et al. Voice activity detection using entropy in spectrum domain
WO2011052221A1 (en) Encoder, decoder and methods thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C19 Lapse of patent right due to non-payment of the annual fee
CF01 Termination of patent right due to non-payment of annual fee