MXPA96000061A - Method for coding a signal that contains - Google Patents

Method for coding a signal that contains

Info

Publication number
MXPA96000061A
MXPA96000061A MXPA/A/1996/000061A MX9600061A MXPA96000061A MX PA96000061 A MXPA96000061 A MX PA96000061A MX 9600061 A MX9600061 A MX 9600061A MX PA96000061 A MXPA96000061 A MX PA96000061A
Authority
MX
Mexico
Prior art keywords
frame
stationarity
mode
spacing
spectrum
Prior art date
Application number
MXPA/A/1996/000061A
Other languages
Spanish (es)
Other versions
MX9600061A (en
Inventor
Ganesan Kalyan
Swaminathan Kumar
K Gupta Prabhat
Original Assignee
Hughes Electronics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US08/229,271 external-priority patent/US5734789A/en
Application filed by Hughes Electronics filed Critical Hughes Electronics
Publication of MX9600061A publication Critical patent/MX9600061A/en
Publication of MXPA96000061A publication Critical patent/MXPA96000061A/en

Links

Abstract

The present invention relates to a method for processing a signal having a speech component, the signal is organized as a plurality of frames, the method comprises the following steps, performed in each of the frames: measuring a value of at least a speech feature of a frame, where the speech feature is selected from the group consisting of spectral stationarity, spacing stationarity, high frequency content and energy, comparing the measured value of selected speech characteristics with at least two thresholds, including a high threshold representing a high value of the selected characteristic of speech and a low threshold representing a low value of the selected characteristic of speech, and placing a first flag if the measured value exceeds the high threshold; Second flag if the measured energy value is below the low threshold, determine if the frame lacks a component bstancial speech based on the determined flags, classify the frame in a noise mode if the frame lacks a substantial component of speech and, otherwise, in a speech mode, and generate a frame encoded according to a scheme Noise mode coding if the frame is classified in the noise mode if the frame is classified in the noise mode and according to a speech coding scheme if the frame is classified in the mode of noise

Description

METHOD FOR CODING A SIGNAL THAT CONTAINS VOICE The scope of the present invention, in general terms, relates to a method for encoding a signal containing speech and, in particular, to a method employing a linear predictor for encoding a signal.
Description of the Related Art A modern communication technique employs an encoder for linear prediction of encrypted code excitation (Codebook Excited Linear Prediction, CELP). The encrypted code is essentially a table containing the excitation vectors for processing by a linear predictive filter. The technique involves starting a "* '' - input signal in multiple portions and, for each portion, look in the ciphered code for the vector that produces a filter output signal that is closest to the input signal The linear excitation prediction technique Typical encryption code may distort portions of the input signal dominated by noise because the encrypted code and the linear predictive filter that may be optimal for speech may be unsuitable for noise.
OBJECTIVE AND SUMMARY OF THE INVENTION An object of the present invention is to provide a method for encoding a signal containing both speech and noise while avoiding some of the distortions introduced by typical encryption code linear prediction encoding techniques. The additional objects and advantages of the invention will be established in the description that follows and in part will be obvious from the description, or can be learned by practicing the invention. The objects and advantages of the present invention can be realized and achieved by means of the instrumentations and "/" combinations particularly pointed out in the appended claims. To achieve the objectives and, in accordance with the purpose of the present invention, as it is incorporated and is widely described herein, a method is used to process a signal having a voice component, the signal being organized as a plurality. from Marcos. The method includes the steps, executed for each frame, to determine if the frame corresponds to a first mode, which depends on the voice component being absent substantially the frame, generating a frame encoded in accordance with one of a first coding scheme, when the frame corresponds to the first mode, and a second coding scheme when the frame does not correspond to the first mode; and decoding the frame encoded in accordance with one of the first coding scheme, when the frame corresponds to the first mode, and the second coding scheme when the frame does not correspond to the first mode.
BRIEF DESCRIPTION OF THE DRAWINGS The above and other objects, aspects and advantages will be better understood from the following detailed description of a preferred embodiment of the invention with reference to the drawings, in which: Figure 1 is a block diagram of a transmitter in a system of wireless communication in accordance with a preferred embodiment of the present invention. Figure 2 is a block diagram of a receiver in a wireless communication system in accordance with a preferred embodiment of the present invention. Figure 3 is a block diagram of the encoder in the transmitter shown in Figure 1. Figure 4 is a block diagram of the decoder in the receiver shown in Figure 2. Figure 5A is a timing diagram showing the alignment of the linear prediction analysis windows in the encoder shown in Figure 3. Figure 5B is a timing diagram showing the alignment of the spacing prediction analysis windows for the prediction of the open cycle spacing in the encoder shown in Figure 3. The Figures 6 and 6B are a flow diagram illustrating the process of quantizing the frequency vector of the 26-bit line spectrum performed by the encoder of Figure 3. Figure 7 is a flow chart illustrating the operation of an algorithm for immediate visualization of a spacing. Figure 8 is a block diagram showing in more detail the estimate of the open cycle spacing of the coder shown in Figure 3. Figure 9 is a flowchart illustrating the operation of the immediate display algorithm of the modified spacing implemented by the estimation of the open cycle spacing shown in Figure 8.
Figure 10 is a flow diagram showing the processing performed by the mode determination module shown in Figure 3. Figure 11 is a data flow diagram showing a part of the one-step processing for determining the stationary values of the spectrum shown in Figure 10. Figure 12 is a data flow diagram showing another part of the step processing to determine the values of stationarity of the spectrum. Figure 13 is a data flow diagram showing another part of the step processing to determine the values of the stationarity of the spectrum. Figure 14 is a flow chart showing the processing of the step to determine the values of The stationarity of the spacing shown in FIG. 10 is shown. FIG. 15 is a data flow diagram showing the processing of the step for generating values of the zero crossing index shown in FIG. 10. FIG. 16 is a flowchart of FIG. data that shows step processing to determine the levels gradient values in Figure 10. Figure 17 is a data flow diagram showing step processing to determine the short-term energy values shown in Figure 10 .
Figures 18A, 18B and 18C are a flow chart for determining the mode based on the values generated as shown in Figure 10. Figure 19 is a block diagram showing in more detail the implementation of the circuits that model the excitation of the encoder shown in Figure 3. Figure 20 is a diagram illustrating a processing of the encoder shown in Figure 3. Figures 21A and 2IB are a diagram of the speech encoder parameters for mode A. Figure 22 is a diagram of the voice coder parameters for mode A. Figure 23 is a diagram of speech coder parameters for mode A. Figure 24 is a block diagram illustrating a processing of the speech decoder shown in Figure 4; and Figure 25 is a timing diagram showing an alternative alignment of the linear prediction analysis windows.
DETAILED DESCRIPTION OF A PREFERRED MODE OF THE INVENTION Figure 1 shows the transmitter of the preferred communication system. The analog-to-digital (A / D) converter 11 samples the analog voice from a telephone transmitter receiver set at a rate of 8 KMz * converts it to digital values, and supplies the digital values to the speech encoder 12. The The encoder of the channel 13 also encodes the signal, as may be required in a digital cellular communication system, and supplies a resultant coded bit stream to a modulator 14. The digital-to-analog converter (D / A) 15 converts the output of the modulator 14 to phase shift key signals (Phase Shift Keying, PSK). The radio frequency (RF) of the converter 16 amplifies and multiplies the PSK signals in frequency and supplies the amplified signals to the antenna 17. A low pass filter, against secondary legends, (not shown) filters the input of the analog voice signal to an A / D converter 11. A filter, high-pass, second-order biquad, (not shown) filters the digitized samples from the A / D converter 11. The transfer function is: 1-2Z "1 + Z ~ Hir = 1 -11.8891Z" 1 + 0.89503Z'2 The high pass filter attenuates direct current or hum contamination may occur in the incoming voice signal. Figure 2 shows the receiver of the preferred communication system. The decreasing radio frequency converter 22 receives a signal from the antenna 21 and heterodyns the signal at an intermediate frequency (IF). The A / D converter 23 converts the IF signal to a digital bitstream, digital bitstream, and the demodulator 24 demodulates the resulting bit stream. At this point the reverse side of the encoder process takes place in the transmitter. The decoder of the channel (ILLEGIBLE) the D / A converter 27 synthesizes the analog voice from the output of the speech decoder. Much of the process described in this specification is done through the instructions of a general-purpose signal processor execution program. Nevertheless, to facilitate a description of the preferred communication system, the preferred communication system is illustrated in terms of block and circuit diagrams. One skilled in the art could easily transcribe these diagrams into program instructions for a processor. Figure 3 shows the encoder 12 of the Figure 1 in greater detail, including an audio preprocessor 31, a 32 predictive linear quantization and analysis module (LP), an open cycle spacing estimation module 33. The module 34 analyzes each frame of the signal to determine whether the frame is in A mode, B mode, or C mode, as described in more detail below. The module 35 performs modulation (unreadable) depending on the mode determined by the module 34. The processor 36 compacts the compressed speech bits. Figure 4 shows the decoder 36 of the Figure 3. which includes a processor 41 for unpacking the compressed speech bits, module 42 for reconstruction of signals for excitation, filter 43, speech synthesis filter 44, and global afterfilter 45. Figure 5A shows the windows of linear prediction analysis. The preferred communication system uses voice frames of 40 milliseconds. For each frame, module 32 performs linear prediction analysis in two 30 millisecond windows that are separated by 20 milliseconds. The first linear prediction window is centered on the middle part, and the second linear prediction window is centered on the front edge of the Voice frame so that the second linear prediction window extends 15 milliseconds within the next frame. In other words, the module 32 analyzes a first part of the frame (linear prediction window 1) to generate a first set of filter coefficients and analyzes a second part of the frame and a part of the next frame (linear prediction window 2) for generate a second set of filter coefficients. Figure 5B shows the spacing analysis windows. For each frame, module 32 performs spacing analysis in two windows of 37,625 milliseconds. The first spacing analysis window is centered on the middle part, and the second spacing analysis window is centered on the front edge of the voice frame, so that the second spacing analysis window spans 18.8125 milliseconds within the next framework. In other words, module 32 * analyzes a third part of the frame (spacing analysis window 1) to generate a first spacing estimate and analyzes a quarter of the frame and a part of the next frame (spacing analysis window 2) to generate a second estimate of spacing. Module 32 uses multiplication through a Hamming window followed by a tenth order autocorrelation method of linear prediction analysis. With this linear prediction analysis method, module 32 obtains optimal filter coefficients and optimal reflection coefficients. In addition, the residual energy after the analysis is also easily obtained and, when expressed as a fraction of the speech energy of the windows linear prediction analysis buffer, is indicated as o; 1 for the first linear prediction window and hear, for the second linear prediction window. These tests of the linear prediction analysis are subsequently used in the selection algorithm according to how the stationarity of the spectrum is measured, as described in more detail below. After the linear prediction analysis, the bandwidth of module 32 widens the filter coefficients for the first linear prediction window, and "" ~ for the second linear prediction window, in 25 milliseconds, converts the coefficients into spectrum frequencies of ten lines (LSF), and quantifies these spectrum frequencies of ten lines with a vector quantization (VQ) of spectrum frequencies of 26-bit lines, as described later. The module 32 employs a 26-bit vector quantization for each set of ten-line spectrum frequencies. This vector quantization provides good and robust performance across a wide range of telephone receiver-transmitter sets and people who speak. Separate vector code quantification codes are designed for the voice material "filtered by intermediate reference system" and the "unfiltered plane" ("filtering-without intermediate reference system"). the vector of spectrum frequencies of unquantized lines is quantified by means of the tables of quantification of vectors "filtering by intermediate reference system" as well as the tables of quantification of vectors "without filtering plane". The optimal classification is selected on the basis of the measure of cepstral distortion *. Within each classification, vector quantification is carried out. The multiple candidates for each fractionation vector are chosen on the basis of an error of the "" - weighted average squared energy, and an overall optimal selection is made within each classification based on the measure of cepstral distortion between all candidate combinations. After the optimal classification is chosen, the quantized linear spectrum frequencies are converted into filter coefficients. More specifically, the module 32 quantizes the ten spectral line frequencies for both sets with a 26-bit multiple code fractionation vector quantizer that classifies the vector of spectrum frequency of unquantified lines as a vector "filtered by an intermediate voice reference system", a vector "filtered by a non-vocal intermediate reference system" - and a "filtered vector without a vocal intermediate reference system", and a vector "filtering without intermediate non-vocal reference system", where "intermediate reference system" (IRS) refers to the filter of the intermediate reference system as specified by the CCITT, Blue Book, Rec. P.48. Figure 6 shows a schematic of the quantization process of the line spectrum frequency vector. Module 32 employs a fractionation vector quantifier for each classification, including a fractionator vector quantizer 3-4-3 for "filtering by voice intermediate reference system" and filtering without a voice intermediate reference system ", categories 51 and 53. The first three line spectrum frequencies use an 8-bit encryption code in function modules 55 and 57, the next four LSFs use a 10-bit encryption code in function modules 59 and 61, and the Last three line frequency frequencies use a 6-bit encryption code in function modules 63 and 65. For categories 52 and 54, "filtering by non-vocal intermediate reference system" and "filtering without system of intermediate reference "non-vocal, a fractionation vector quantizer 3-3-4 is used." The first three line spectrum frequencies use a 7-bit coded code in function modules 56 and 58, the next three frequencies of line spectrum uses an 8-bit encryption code in function modules 60 and 62, and the last four line-spectrum frequencies use a 9-bit encryption code in function modules 64 and 66. For each vector codebook For fractionation, the three best candidates are selected in function modules 67, 68 69, and 70 using the squared-weighted average energy error criteria.The energy weighting reflects the power level of the spectrum envelope at each frequency line spectrum The three best candidates for each of the three fractionation vectors result in a total of twenty-seven combinations for each category. so that at least one combination would result in an ordered set of line spectrum frequencies. Usually, this is a very mild restriction imposed on the search. The optimum combination of these twenty-seven combinations is selected in the function module 71 depending on the measure of the cepstral distortion. Finally, the optimal category or classification is also determined on the basis of the measure of the distortion cepstral The spectrum frequencies of quantized lines are converted into filter coefficients and then into autocorrelation waits for interpolation purposes. The resulting wave spectrum frequency vector quantizer scheme is not only effective through speakers but also through various degrees of filtering by intermediate reference system that models the influence of the transducer of the telephone receiver-transmitter set. The encrypted codes of the vector quantifiers are prepared from a voice database of sixty speakers using the flat frequency configuration as well as the intermediate reference system (IRS). This is designed to provide consistent and good performance between several people speaking and through various sets of telephone receivers-transmitters. The average log spectral distortion across the entire TIA half index database is approximately 1.2 dB for voice data filtered by the intermediate reference system and approximately 1.3 dB for filtered voice data without the reference system intermediate Two estimates of spacing per frame are determined at 20 millisecond intervals. These open cycle spacing estimates are used in the selection mode and to encode the closed loop spacing analysis if the selected mode is a predominantly vocal mode. The module 33 determines the two spacing estimates from the two spacing analysis windows described above in relation to FIG. 5b, using a modified form of the spacing immediate display algorithm shown in FIG. 7. This algorithm is used to estimate Spacing makes an initial estimate of the spacing in function module 73 using an error function calculated for all values in the set. { [22.0, 22.5, ...,]] 4.5} , followed by an immediate display of the spacing to obtain a total optimum spacing value. The function module 74 employs ~~~ immediate visualization of retrospective spacing using the error functions and spacing estimates of the two previous spacing analysis windows. The function module 75 employs an immediate preview spacing display using the error functions of the two future spacing analysis windows. The decision module 76 compares the spacing estimates that depend on the immediate display of the retrospective and anticipated spacing to obtain a total optimum spacing value at the output 77. The The algorithm for estimating the spacing shown in Figure 7 requires the error functions of two future spacing analysis windows for immediate display of the anticipation spacing and thus introducing a delay of 40 milliseconds. In order to avoid this punishment, the preferred communication system employs a modification of the spacing estimation algorithm of FIG. 7. FIG. 8 shows the estimate of the open cycle spacing 33 of FIG. 3 in more detail. The one and two spacing analysis windows are inputs to their respective computation error functions 331 and 332. The outputs of these computations of error functions are the inputs to a refinement of past spacing estimates 333, and the spacing estimates Refinements are sent both for immediate viewing in retrospect and for the immediate display of anticipated spacing 334 and 335 for the spacing windows one. The outputs of the immediate spacing display circuits are inputs for the selector 336 which selects the open cycle spacing one as the first output. The selected open cycle spacing is also the input to an immediate display spacing circuit in retrospect for the two spacing window that outputs the two open cycle spacing.
Figure 9 shows the modified spacing immediate display algorithm implemented by the spacing estimation circuits of Figure 8. The modified spacing estimation algorithm employs the same error function as the algorithm in Figure 7 in each analysis window of spacing, but the immediate display scheme is altered. Before the immediate display of the spacing for each of the first or second spacing analysis windows, the two spacing estimates of the previous two spacing analysis windows in the function modules 81 and 82, respectively, are refined. both with the immediate display of spacing in retrospective and the immediate display of spacing in advance using the error functions of the two current spacing analysis windows. This is followed by the immediate display of retrospective spacing in the function module 83 for the first spacing analysis window using the refined spacing estimates and the error functions of the two spacing analysis windows prior. The immediate display of anticipation spacing for the first spacing analysis window in the function module 84 is limited to the use of the error function of the second spacing analysis window. The two estimates are compared in the module decision 85 to obtain a better overall estimate of spacing for the first spacing analysis window. For the second spacing analysis window, the immediate display of retrospective spacing is carried out in the function module 86 as well as the spacing estimation of the first spacing analysis window and its error function. An immediate preview spacing is not used for this second spacing analysis window with the result that the retrospective spacing estimate is taken as the best estimate of spacing at output 87. Figure 10 shows the processing of the spacing. mode determination made by the mode selector 34. Depending on the stationarity of the spectrum, the '"" "spacing stationarity, short-term energy, short-term level gradient, and zero crossing rate of every 40 milliseconds, frame *?, the mode selector 34 classifies each unit and information into one of three modes: vocal and stationary mode (Mode A), non-vocal or transient mode (Mode B), and mode with background noise (Mode C). More specifically, the mode selector 34 generates two logical values, each indicating spectral stationary or similarity of the spectral content between the currently processed frame and the previous frame (Step 1010). The selector of mode 34 generates two logical values that indicate spacing stationarity, similarity of the fundamental frequencies, between the currently processed frame and the previous frame (Step 1020). The mode selector 34 generates two logical values that indicate the zero crossing rate of the currently processed frame (Step 1030), an index influenced by the high frequency components of the frame related to the lower frequency components of the frame. The mode selector 34 generates two logical values that indicate the level gradients within the frames currently processed (Step 1030). The mode selector 34 generates five logic values that indicate the short-term power of the currently processed frame (Step 1050). Subsequently, the mode selector 34 determines that the frame mode is mode A, mode B or mode C, • '"" depending on the values generated in Steps 1010-1050 (Step 1060). Figure 11 is a block diagram showing a processing of Step 1010 of Figure 10 in greater detail. The processing of Figure 11 determines a cepstral distortion in dB. The module 1110 converts the quantized filter coefficients of window 2 of the current frame into the standby domain and module 1120 converts the quantized filter coefficients of window 2 of the previous frame into the standby domain. He module 1130 interpolates the outputs of modules 1110 and 1120, and module 1140 converts the output of module 1130 back into filter coefficients. The module 1150 converts the output of the module 1140 into the cepstral domain, and the module 1160 converts the unquantified filter coefficients from window 1 of the current frame into the cepstral domain. The module 1170 generates the cepstral distortion dc from the outputs of 1150 and 1160. Figure 12 shows the generation of the spectral stationarity value LPCFLAG1 which is a relatively strong indicator of the spectral stationarity for the frame. The mode selector 34 generates the LPCFLAG1 flag using a combination of two techniques to measure the spectral stationarity. The first technique compares the cepstral distortion dc using the comparators 1210 and 1220. In Figure 12, the threshold input dtl to the comparator 1210 is -8.0 and the threshold input d ^ to the comparator 1220 is -6.0. The second technique is based on the residual energy after the analysis * LPC, expressed as a fraction of the speech output spectral energy of LPC analysis. This residual energy is a by-product of the LPC analysis, as described above. The input x to the comparator 1230 is the residual energy for the filter coefficients of window 1 and input ot2 to comparator 1240 is the residual energy for the filter coefficients of window 2. The input dβ to comparators 1230 and 1240 is a threshold equal to 0.25. Figure 13 shows a flow diagram inside the mode selector 34 for a generation of the flag of the spectral stationarity value LPCFLAG2, which is a relatively weak indicator of the spectral stationarity. The processing shown in Figure 13 is similar to that shown in Figure 12, except that the LPCFLAG2 flag is based on a relatively relaxed set of thresholds. The input d ^ to the comparator 1310 is -8.0 *, the input dt3 to the comparator 1320 is -4.0, the input dt4 to the comparator 1350 is -2.0, the atl input to the comparators 1330 and 1340 is a threshold 0.25 and the aa the comparators 1360 and 1370 is 0.15. The mode selector 34 measures the spacing stationality using both open cycle spacing values of the current frame, indicated as Pt for spacing window 1 and P2 for spacing window 2, and the open cycle spacing value of the spacing window. window 2 of the previous frame indicated by P ^. A lower range of spacing values (Pupu?) And a higher range of spacing values (PL2PU2 ^ are: PL1 = MIN (Pr P2) - Pt Pm = MIN (Pr P2) + Pt PL2 = MAX (Pr P2) - Pt PU2 = MAX (Pr P2) + Pt, where Pt is 8.0 If the two relays do not overlap, that is, PL1 > Pur then only a weak indicator of spacing stationarity is indicated, indicated by PITCHFLAG2, PITCHFLAG2 is set if Pt falls within any of the lower range (PL1, P) or the upper range (PL2, PU2). If the two ranges overlap, that is PL2 < Put a strong indicator of spaciality is possible, indicated by PITCHFLAG1, and it is set if Pt falls within the range (PL, P), where PL = (Pi + P2) / 2 - 2Pt Pu = (Pi + P2) / 2 - 2PL Figure 14 shows a data stream to generate the PITCHFLAG1 and the PITCHFLAG2 within the mode selector 34. The module 14005 generates an output equal to the input that has the largest value, and the module 14010 generates an output equal to the entry that has the smallest values. The module 1420 generates an output that is an average of the values of the two inputs. The modules 14030, 14035, 14040, 14045, 14050 and 14055 are adders. The 14080, 14025 and 14090 modules are AND values. The module 14087 is an inverter. The modules 14065, 14070, and 14075 are each logic blocks that generate a true output when (C> = B) & (C < = A). The circuit of Figure 14 also processes the reliability values V.j, Vj, and V2, each indicating when the values P.j are reliable., Pj, and P2, respectively, typically, these reliability values are a byproduct of the spacing calculation algorithm. The circuit shown in Figure 14 generates false values for PITCHFLAG 1 and PITCHFLAG 2 if any of these flags Vp Vt, and V2 are false. The processing of these reliability values is optional. Figure 15 shows the data flow within the mode selector 34 to generate two logical values indicating the zero crossing index for the frame. Each of the modules 15002, 15004, 15006, 15008, 15010, 15012, 15014 and 15016 counts the number of zero crosses in a sub-frame of 5 milliseconds in relation to the frame that is currently being processed. For example, module 15006 counts the number of zero crosses of the signal that occurs from time 10 milliseconds from the beginning of the frame to the time 15 milliseconds from the beginning of the frame. The comparators 15018, 1020, 15022, 14024, 15026, 15028, 15030 and 15032, in combination with the adder 15035, generate a value that indicates the number of 5 milliseconds (MS) that have zero crossings of > = 15 Comparator 15040 sets the flag ZC_LOW when the number of those sub-frames is less than 2, and comparator 15037 sets the flag ZC_HIGH when the number of those sub-frames is greater than 5. The input of the value ZCt to comparators 15018-15032 is 15 , the input of the value Ztl to the comparator 15040 is 2, and the input of the value Zß to the comparator 15037 is 5. Figures 16A, 16B, and 16C show the data flow to generate two logical values indicating the short-term level gradient . The mode selector 34 measures the short-term level gradient, an indication of the transients within a frame using a filtered low-pass version of the compressed-expanded input signal amplitude. The module 16005 generates the absolute value of the input signal S (n), the module 16010 compresses expands its input signal, and the low-pass filter 16015 generates an AL (n) signal which, at the time instant n, is expressed mediatne: AL (n) = (63/64) AL (nl) + (1/64) C (| s (n) |) where the compression-expansion function C (-) is the function μ- law described in CCITT G. 711. The delay 16025 generates an output that is a delayed version of 10 milliseconds of its input and the subtractor 16027 generates a difference between AL (n) and AL (n). Module 16030 generates a signal that is an absolute value of its input. Every 5 milliseconds, the mode selector 34 compares AL (n) with the last 10 milliseconds, and if the difference | AL (n) - AL (n-80) | exceeds a fixed relaxed threshold, a counter is incremented. (In the preceding expression, 80 corresponds to 8 samples per MS times 10 MS). as shown in Figure 16C, if this difference does not exceed a relatively stringent threshold (L? = 32) for any sub-frame, the mode selector 43 sets LVLFLAG2, weakly indicating the absence of transients. As shown in Figure 16B, if this difference exceeds a more relaxed threshold (Ltl = 10) by no more than one sub-frame (Lt3 = 2) the mode selector 34 sets LVLFLAG1, strongly indicating an absence of transients. More specifically, Figure 16B shows the delay circuits 16032-16046 that each generates a delay version of 5 milliseconds of its input. Each of the locks 16048-16062 saves a signal at its input. The locks 16048-16062 are clipped to a common time near the end of each 40 millisecond voice frame, so that each lock stores a portion of the frame spaced 5 milliseconds from the portion held by an adjacent lock. The 16064-16078 comparators each compare the output of a respective lock with respect to the threshold Ltl and the adder 16080 adds the outputs of the comparator and sends the sum to the comparator 16082 to compare them with the threshold Lt3. Figure 16C shows a circuit for generating the LVLFLAG2 flag. In Figure 16C, the delays 16132-16146 are similar to the delays shown in Figure 16B and the locks 16148-16162 are similar to the locks shown in Figure 16B. The comparators 16164-16178 each compare a respective output of the latch with respect to the threshold Lß = 2. Thus, the OR gate 16180 generates a true output if any of the latched signals originating from the 16030 module exceed the threshold hü . The inverter 16182 inverts the output of the gate OR 16180. Figure 17 shows a data flow to generate short-term power indicator parameters. The short-term energy is measured according to the average energy squared (average energy per sample) on a frame basis as well as on a base of 5 milliseconds. The short-term energy is determined in relation to an antecedent energy Ebn. The Ebn is established for a constant E0 = (100 X (12) 1/2) 2. Subsequently, when it is determined that a frame is in C mode, Ebn * is set equal to (7/8) Ebn + (1/8) E0. Thus, some of the thresholds used in the circuit of Figure 17 are adaptable. In Figure 17, Et < > = 0 - 7 0 7 Ebn 'Etl = 5 Et2 = 2 - 5 Ebn' Et3 = 1 - 8 Ebn 'Et4 = Ebn' Et5 = 0. 707 Ebn, and Et6 = 16. 0 The short-term energy on a 5 millisecond basis provides an indication of the presence of voice throughout the frame using a single EFLAG1 flag, which is generated by testing the short-term energy on a 5 millisecond basis against a threshold, increasing a counter whenever the threshold is exceeded, and testing the final value of the counter against a fixed threshold. Comparing short-term energy on a frame basis against different thresholds provides the indication of the absence of voice along the frame in the form of several flags with varying degrees of confidence. These flags are indicated as EFLAG2, EFLAG3, EFLAG4 and EFLAG5. Figure 17 shows the data flow within the data selector 34 to generate these flags. The modules 17002, 17004, 17006, 17008, 17010, 17015, 17020, and 17022, each count the energy in a respective sub-frame of 5 milliseconds of the frame that is currently being processed. The comparators 17030, 17032, 17034, 17036, 17040, 17042, and 17044, in combination with the adder 17050, count the number of sub-frames that have an energy exceeding Et0 = 0.707 Ebn. Figures 18A, 18B and 18C show the processing of step 1060. The mode selector 34 first classifies the frame as background noise (C mode) or voice (modes A or B). Mode C tends to be characterized by low energy, relatively high spectral stationarity between the current frame and the previous frame, a relative absence of spacing between the current frame and the previous frame, and a high zero crossing rate. Background noise (mode C) is declared either on the basis of the strongest short-term energy EFLAG5 flag alone or by combining the weakest short-term energy flags EFLAG4, EFLAG3 and EFLAG 2 with other flags that they indicate a high zero crossing index, the absence of a spacing, the absence of transients, and so on. More specifically, if the previous frame mode was A or if the EFLAG2 is not true, the procedure proceeds to step 18045 (step 18005). Step 18005 ensures that the frame will not be mode C if the previous frame was mode A. The current frame is mode C if (LPCFLAG1 and EFLAG3) is true or (LPCFLAG2 and EFLAG4) is true or EFLAG5 is true (steps 18010, 18015 , and 18020). The current frame is in C mode if ((not PITCHFLAG1) and LPCFLAG1 and ZC_HIGH) is true (step 18025) or ((not PITCHFLAG1) and (not PITCHFLAG2) and LPCFLAG2 and ZC_HIGH) is true (step 18030). In this way the processing shown in Figure 18A determines whether the frame corresponds to a first mode (Mode C), depending on whether a voice component is substantially absent from the frame. In step 18045, a rating is calculated depending on the previous frame mode. If the previous frame mode was mode A, the rating is 1 + LVFLAG1 + EFLAG1 + ZC_LOW. If the previous mode was B mode, the rating is 0 + LVFLAG1 + EFLAG1 + ZC_LOW. If the previous frame mode was C mode, the rating is 2 + LVFLAG1 + EFLAG1 + ZC_LOW. If the previous frame mode was C mode or not LVLFLAG2, the current frame mode is mode B (step 18050). The current frame is mode A if (LPCFLAG1 and PITCHFLAG1) is true or (LPCFLAG2 and PITCHFLAG1) is true, provided that the rating is not less than 2 (steps 18060 and 18055). The current frame is mode A if (LPCFLAG1 and "* ~ PITCHFLAG2) is true or (LPCFLAG2 and PITCHFLAG1) is true, provided that the rating is not less than 3 (steps 18070, 18075, and 18080). voice encoder 12 generates a frame encoded in accordance with one of a first coding scheme (an encoding scheme for mode C), when the frame corresponds to the first mode, and an alternative coding scheme (a coding scheme for the modes A or B), when the frame does not correspond to the first mode, as described with greater detail later. For mode A, only the second set of vector quantization indices of the line spectral frequency need to be transmitted because the first set can be inferred in the receiver due to the slowly varying nature of the vocal track configuration. In addition, the first and second estimates of open-cycle spacing are quantified and transmitted because they are used to coding the closed-cycle spacing estimates in each sub-frame. The quantization of the second open cycle spacing estimate is achieved using a non-uniform 4-bit quantizer while the quantization of the first open cycle spacing estimate is achieved using a 3-bit non-uniform differential quantizer. Since the "" "" "quantization indices of the vectors of the line spectrum frequencies for the first window of linear prediction analysis are neither transmitted nor used in the mode selection, no need to calculate them in the A mode. This reduces the complexity of the short-term predictor section of the encoder in this mode.This reduced complexity as well as the lowest bitrate of the short-term predictor parameters in A mode is compensated by the faster update of all the parameters of the excitation model.
For mode B, both sets of quantization of line spectral frequency vectors must be transmitted due to potential non-stationary potential. However, for the first set of spectrum frequencies we need to search only 2 of the 4 classifications or categories. This is because the selection of intermediate reference system versus non-intermediate reference system varies very slowly over time. If the second set of line spectrum frequencies of the category "filtered vocal intermediate reference system" is chosen, then the first set can be expected to be of the categories either "filtering by voice intermediate reference system" or "filtered by non-vocal intermediate reference system". If you choose the second set of line spectrum frequencies to * '"" split the category "filtered by non-vocal intermediate reference system", then again you can expect the first set to be of the categories or "filtering by intermediate voice reference system" or "filtering by system of non-vocal intermediate reference ". If the second set of linear spectrum frequencies is chosen from the category of "filtering without a vocal intermediate reference system", then the first set can be expected to be from any of the categories "filtering without a vocal intermediate reference system". or "filtered without non-vocal intermediate reference system. "Finally, if the second set of line spectrum frequencies is chosen from the category" filtering without intermediate non-vocal reference system ", then again the first set can be expected to be any of the categories "filtering without a vocal intermediate reference system" or "filtering without a non-vocal intermediate reference system." As a result, you only need to look for two categories of encrypted code line spectrum frequencies for the quantization of the first set of frequencies In addition, only 25 bits are needed to encode these quantization indices instead of the 26 required for the second set of line spectrum frequencies, since the optimal category for the first set can be encoded using only 1 bit For mode B, neither of the two open cycle spacing estimates are transmitted as they are not used for gui ar the closed-cycle spacing estimates. The greater complexity involved in coding as well as the higher bit rate of the parameters of the short-term predictor in mode B is compensated by a slower update of all parameters of the excitation model. For mode C, only the second set of line spectral frequency vector quantization indices need to be transmitted because the human ear It is not as sensitive to rapid changes in the shape variations of the spectrum for noisy inputs. In addition, those fast spectrum shape variations are typical for many kinds of background noise sources. For mode C, neither of the two open cycle spacing estimates are transmitted since they are not used to guide the closed loop spacing estimation. The lower complexity involved as well as the lower bit rate of the short-term predictor parameters in C mode is compensated for by a faster update of the gain portion of the fixed cipher code of the parameters of the excitation model. The quantization tables of the gain are elaborated for each of the modes. Also in each mode, the closed loop parameters are refined using a delay decision approach. This delay decision is used in such a way that the total encoder / decoder delay is not increased. This delay decision approach is very effective in the transition regions. In mode A, quantization indices that correspond to the second set of short-term predictor coefficients as well as open-cycle spacing estimates are transmitted. Only these quantized parameters are used in the modulation of the excitement. The 40-millisecond voice frame is divided into seven sub-frames. The first six are 5.75 milliseconds in length and the seventh is 5.5 milliseconds in length. In each subframe, a set of interpolated short-term predictor coefficients is used. The interpolation is done in the autocorrelation waiting domain. Using this set of interpolated coefficients, a closed-loop analysis is used by a synthesis approach to derive the optimal spacing index, the spacing gain index, the fixed cipher code index, and the gain index of the fixed cipher code for each sub-frame. The search range of the closed cycle spacing index is centered around an interpolated path of the open cycle spacing estimation. The exchange between the search range and the resolution of the spacing is done in a dynamic way depending on the proximity of the open cycle spacing estimates. The fixed coded code uses zinc pulse forms that are obtained using a weighted sync pulse combination and a shifted phase version of its Hilbert transformation. The gain of the fixed encrypted code is quantified in a different way. The analysis using the synthesis technique that is used to derive the models of the excitation model uses an interpolated set of predictor coefficients in the short term in each sub-frame. The determination of the optimal set of parameters of the excitation model of each sub-frame is determined only at the end of each frame of 40 milliseconds due to the delayed decision. When deriving the parameters of the excitation model, the seven subframes are assumed to be of a length of 5.75 milliseconds or forty-six samples. However, for the last or seventh sub-frame, the end of the sub-frame is updated as the adaptive encrypted code is updated and the local short-term predictor status update is carried out only for a sub-frame length of 5.5 milliseconds or forty-four samples. The parameters of the short-term predictor or the parameters of the linear prediction filter are interpolated from sub-frame to sub-frame. Interpolation is carried out in the autocorrelation domain. The normalized autocorrelation coefficients derived from the quantized filter coefficients for the second window of linear prediction analysis are indicated as. { p.t (i)} for the frame of 40 milliseconds previous and by. { p2 (i)} for the current 40 millisecond frame for 0 < i < 10 with p.t (0) = p2 (0) = 1.0. Then the coefficients. { p'm (i)} are given then by P 'm (?) = V * P2 (Í) + [l "m]' Pl (Í)> 1 <m < 7, 0 < i < 10, or in vector notation P 'm = vm • P2 + [1 - v? 'PP l < m < 7. Here, vm is the interpolation weight for subframe m Interpolated waits { p'm (i).}. in the filter coefficients of the short-term predictor { a'm (i).}.) The choice of the interpolation weights affects the vocal quality in this mode significantly.For this reason, they must be carefully determined. interpolation vm have been determined for sub m by minimizing the error of the mean squared between the real short-term spectral envelope SmJ (?) and the short-term spectral envelope of power SmJ (?) over all J frames of a Very large voice database In other words, m is determined by minimizing = S 'm ,, (?) M, J (?) D ?. J 2x 'x If the actual autocorrelation coefficients for subframe m in frame J are indicated by. { pm > j (k)} , then by definition Sm? J (?) =? -, r, pm, J (k), -jwk 10 10 S'm, j (?) = p'm, j (k), -jwk Substituting the previous equations in the preceding equation, it can be shown that the minimization of Em is equivalent to minimizing E'm where E'm is given by 10 E'm = f £ ", [Pm, J (k)" P ' m, J (k)] 2, or in vector notation E'm = f I I Pm.J - P'm, J I. where | | • | | represents the norm of the vector. Substituting pmJ in the above equation, differentiating with respect to vm and adjusting to zero results in XJ = P2.J "P-1.J and Ym, J = Pm, J" P-1.J and < XJ 'Ym, J > is the product of the point between the vectors S and mtj. The vm values calculated by the above method using a very large voice database are tuned refined by listening tests.
The target vector Tac for the search for the encrypted code refers to the voice vector in each sub-frame by s = Ht ^ + Z. Here, H is the triangular toeplitz * matrix below the square whose first column contains the impulse response of the interpolated short-term predictor. { to me)} for subframe m and z is the vector that contains its zero input response. The metatac vector is most easily calculated by subtracting the zero input response z from the speech vector s and filtering the difference by the inverse short-term predictor with the zero initial states. The search for the adaptive cipher code in the adaptive cipher codes 3506 and 3507 employs a squared-weighted mean error £ to measure the distance between a candidate vector r, and the meta-tac vector, as given by fi = ^ ac " μ¡ri) Tw (tac - ¡r¡) Here, μ¡ is the associated gain and W is the spectral weighting matrix, W is a positive definite symmetric toeplitz matrix that is derived from the truncated impulse response of the predictor short term weighted with the filter coefficients { a 'm (i) t1.}.) The weighting factor r is 0.8, substituting for the optimum μ in the previous expression, the distortion term can be rewritten as [pj] 5j = tactwta ei where pj is the term of the tactory correlation and ej is the energy term rjTWrj. Only those candidates are considered to have a positive correlation. The best candidate vectors are those that have the positive correlations and the highest values of [p] 2 The candidate vector r, corresponds to different spacing delays. These spacing delays in the samples fall into the lag [20,146]. The spacing delays in fractions are possible but the fraction part f is restricted to be either 0.00, 0.25, 0.50, or 0.75. The candidate vector corresponding to an integer delay L is simply read from the adaptive cipher code, which is a collection of the past excitation samples. For a mixed delay (integer plus fraction) L + f, the portion of the adaptive cipher code centered around the section corresponding to the entire delay L is filtered by a polyphase filter corresponding to the fraction f. Incomplete candidate vectors corresponding to low delay values less than one sub-frame length are completed in the same manner as suggested by J. Campbell et al., Cited above. The polyphase filter coefficients are derived from a prototype low pass filter designed to have a good pass band as well as good stop band characteristics. Each polyphase filter has 8 derivations. The search for adaptive encryption does not search for all candidate vectors. For the first 3 sub-frames, a 5-bit search range is determined by the second quantized open cycle spacing estimate P'-1 of the previous 40 millisecond frame and the first P'j estimate of the current 40 millisecond frame. If the previous mode was B, then the value P'.j is taken to be the last delay of submarine spacing in the previous frame. For the last 4 sub-frames, this 5-bit search range is determined by the second estimate of quantized open cycle spacing P'2 of the current 40 millisecond frame and the first estimate of quantized open cycle spacing P'j of the frame of 40 milliseconds current. For the first 3 sub-frames, this 5-bit search range is divided into 2 4-bit ranges centered each range around P'.j and P'j. If these two 4-bit ranges overlap, then a single 5-bit range is used that is centered around. { P'-i + P'j} /2. Similarly, for the last 4 sub-frames, this 5-bit search range is divided into 2 4-bit ranges with each range centered around P ^ and P'2. If these two 4-bit ranges overlap, then a single 5-bit range is used which is centered around. { P'j + The selection of the search range also determines which fractional resolution is needed for the closed cycle spacing. This desired fractional resolution is determined directly from the estimate of quantized open cycle spacing P'.i and? ^ for the first 3 sub-frames and from P'j and P'2 for the last 4 sub-frames. If the two determinants of open cycle spacing are within the 4 integer delays of each resulting in a single 5-bit search range, only 8 integer delays centered around the midpoint are searched, but the f portion of the spacing fractional can take values of 0.00, 0.25, 0.50 or 0.75 and therefore, also look for. Thus, 3 bits are used to encode the integer portion while 2 bits are used to encode the fractional portion of the closed loop spacing. If the two estimates of spacing of open cycle are within 8 integer delays of each other resulting in a single 5-bit search range, only 16 integer delays centered around the midpoint are searched, but the fractional portion of spacing f can take values of 0.0 or 0.5 and therefore also searched, so 4 bits are used to encode the integer portion while 1 bit pair is used to encode the fractional portion of the closed loop spacing. If the two determinants of open cycle spacing are more than 8 integer delays apart, only integer delays are searched for, that is, f = 0. * only, in either the 5-bit simple search range or the two range of Search for 4 specific bits. Thus, the 5 bits are used to encode the entire portion of the closed loop spacing. ++++ The complexity of the search can be reduced in the case of fractional spacing delays by first looking for the optimal integer delay and looking for the optimal fractional delay only in its proximity. One of the 5-bit indexes, the index composed of zeros, is reserved for the adaptive cipher code vector composed of zeros. This is appropriate by cutting the search range from the 5-bit delay or 32 to a spacing delay search range 31. As indicated above, the search is restricted to only positive correlations and the index composed of zeros is chosen if that positive correlation is not found. The adaptive codebook gain is determined after the search by quantifying the ratio of the optimal correlation to the optimum energy using a non-uniform 3-bit quantizer. This 3-bit quantizer only has positive gain values in it since only positive gains are possible. As the delayed decision is employed, the adaptive cipher search results in the two best spacing delays or wait candidates in all sub-frames. In addition, for the two to six subsets, this has to be repeated for the two target vectors produced by the two best sets of parameters of the derived excitation model for the previous subframes in the current frame.
'"" This results in the two best wait candidates and the two associated adaptive cipher code wins for the sub-frame one and the four best standby candidates and the associated four adaptive codebook winnings for sub-frames two to six at the end of the search process. In each case, the target vector for the fixed encrypted code is derived by subtracting the adaptive encrypted code vector graduated from the target for the search of the adaptive encrypted code ie, tsc = t ^. - μ ^ ropt, where ropt is the selected adaptive cipher code vector and μopt is the gain of the associated adaptive encrypted code. In mode A, the fixed coded code consists of the general excitation impulse forms constructed from the discrete sync and cose functions. The sinc function is defined as sinc (n) = sin (7rn), n = 0 pn sinc (0) = 1 n = 0 and the cose function is defined as cosc (n) = 1-cos (pn), n = 0 pn cose (0) = 0 n = 0 With these definitions in mind, the generalized excitation pulse forms are constructed as follows: zx (n) = A sinc (n) + B cosc (n + l) zj (n ) = A sinc (n) - B cose (nl) Weights A and B are chosen to be 0.866 and 0.5 respectively. With the sinc and cose functions aligned in time, they correspond to what is known as the zinc base functions z0 (n). Informal listening tests show that the forms of the impulses changed in time improve the vocal quality of the synthesized voice. The fixed encrypted code for mode A consists of 2 parts each having 45 vectors. The first part consists of the pulse form z.j (n = 45) and has 90 samples in length. The vector is simply the vector that starts from the iés? ma entry of the encrypted code. The second part consists of the pulse form z (n = 45) and has 90 samples in length. Here again, the vector is simply the vector that starts from the beginning of the encrypted code. Both encrypted codes are also trimmed to reduce all small values, especially near the start and end of both coded codes to zero. In addition, we note that each even sample in any encrypted code is identical to zero by definition. All this contributes to making the encrypted codes very dense. In addition, we note that both encrypted codes overlap with adjacent vectors all having one entry in common. The nature of overlap and the low density of the encrypted codes are made explicit in the search for the encrypted code that uses the same measure of distortion as in the search for the adaptive encrypted code. This measure calculates the distance between the target vector of the fixed encrypted code tsc and each candidate to fixed encrypted code vector ci as Ei = (tsc -? ICi) tW (tsc -? Ci) Where W is the same spectral weighting matrix used in the search for adaptive cipher code and j is the optimal value of the gain for that i? mo vector of encrypted code. Once the optimal vector has been selected for each encrypted code, the magnitude of the gain of the encrypted code outside the search cycle is quantified quantifying the relation of the optimal correlation with respect to the optimal energy by means of a non-uniform differential quantizer of 4 bits in odd sub-frames and a differential quantizer of 3 bits in even sub-frames. Both quantifiers have zero gain as one of their inputs. Then the optimal distortion is calculated for each encrypted code and the optimal encrypted code is selected. The fixed encryption code index for each subframe is in the range of 0-44 if the optimal encrypted code is from z "1 (n = 45) but maps to the range 45-89 if the optimal encrypted code is from Zj ( n = 45). Combining the fixed cipher code indexes of two consecutive frames I and J as 90 I + J, we can encode the resulting index using 13 bits. this is done for subsets l and 2, 3, and 4, 5, and 6. For subframe 7, the fixed cipher code index is encoded simply using 7 bits. The gain sign of the fixed code is encoded using 4 bits in the sub-frames 1, 3, 5, 7 and using 3 bits in the sub-frames 2, 4, 6. Due to the delay decision, there are two vectors meta t ^ for the Search for encrypted code in the first submarco corresponding to the two best waiting candidates and their corresponding gains provided by the search for adaptive encrypted closed cycle code. For subsets two to seven, there are four target vectors that correspond to the two best sets of parameters of the excitation model determined for the previous subsets as well as for the two best wait candidates and their gains provided by the search for the adaptive code in the current sub-frame. The encrypted code search, therefore, is carried out twice in the sub-frame one and four times in sub-frames two to six. But the complexity does not increase in a proportionate way because in each subframe, the energy terms cTjWc¡ are the same. Only the terms of the ttscWc¡ correlation are the ones that are different in each of the two searches for sub-frame one and in each of the four searches in sub-frames two to seven. The search for the delay decision helps to smooth out the contours of the spacing and the gain in an encoder for linear prediction of excitation of encrypted code. The delayed decision is employed in the present invention in such a way that the total coding / decoding delay is not increased. Thus, in each sub-frame, the search for closed cycle spacing produces the M best estimates. For each of these M best estimates and N best previous subframe parameters, MN gain indices are derived from optimal spacing, fixed cipher code indices, fixed cipher code gain indices, and fixed cipher code gain signals. At the end of the subframe, these MN solutions are trimmed to the best L using as criteria the cumulative S R for the current 40 millisecond frame. For the first sub-frame, M = 2, N = 1 and L = 2 are used. For the last sub-frame, M = 2, N = 2 and L = l are used. For the other subframes, M = 2, N = 2 and L = 2 are used. The delayed decision approach is particularly effective for the transition of the regions from vowel to non-vowel and from non-vowel to vowel. This delayed decision approach results in N times the complexity of the closed cycle spacing search but much less than MN times the complexity of the fixed code search in each subframe. This is because you only need to calculate the MN times correlation terms for the fixed cipher code in each sub-frame, but the energy terms are only calculated once. Optimal parameters for each subframe are determined only at the end of the 40 millisecond frame, using backward tracing. The trimming of MN solutions up to L solutions is stored for each subframe to allow tracing backwards. An example of how it is worn out the backward tracking is shown in Figure 20. The dark, thick line indicates the optimal trajectory obtained by tracing backwards after the last sub-frame. In mode B, the quantization indices of both sets of short-term predictive parameters are transmitted but not the open-cycle spacing estimates. The 40-millisecond voice frame is divided into five sub-frames, each 8 milliseconds long. As in mode A, an interpolated set of filter coefficients is used to derive a spacing index, a spacing gain index, a fixed cipher code index, and a fixed cipher code gain index in a cycle analysis closed by the synthesis * mode. The search for closed-cycle spacing has an unrestricted range, and only integer spacing delays are searched for. The fixed encryption code is an encryption code of multiple innovations with zinc pulse sections as well as Hadamard sections. The zinc pulse sections are well suited to the transient segments while the Hadamard sections are better suited for the non-vocal segments. The fixed encryption code search procedure is modified to take advantage of this. The greater complexity involved as well as the higher bit rate of the short-term predictor parameters in B mode it is compensated by a slower update of the parameters of the excitation model. For mode B, the 40-millisecond voice frame is divided into five sub-frames. Each sub-frame has a length of 8 milliseconds or sixty-four samples. The parameters of the excitation model in each subframe are the adaptive encrypted code index, the adaptive encrypted code gain, the fixed encrypted code index, and the fixed encrypted code gain. There is no sign of fixed code encryption gain since this is always positive. The best estimates of these parameters are determined using an analysis using the synthesis method in each sub-frame. The best overall estimate is determined at the end of the 40 millisecond frame, using a delayed decision approach similar to mode A. The short-term predictor parameters or the linear prediction filter parameters are interpolated from sub-frame to sub-frame in the wait for autocorrelation. The normalized autocorrelation waits derived from the quantized filter coefficients for the second linear prediction analysis window are indicated as. { P'.jíi)} for the frame of 40 milliseconds previous. The corresponding wait times for the first and second linear prediction analysis windows for the current 40 millisecond frame are indicated by. { Pj (i)} Y . { P2 (i)} , respectively. Normalization ensures that p- ^ O) = Pj (0) = p2 (0) = 1.0. The interpolated autocorrelation waits. { P'm (i)} are given by P'm (i) m «m * P-l (Í) + ßm • Pl (i) + [l-am-ftn] 'P2 (í), l < = m < = 5, 0 < = i < = 10, or in vector notation P ' • P-l + [l -0fm-j8] • P2 l < = m < = 5 Here, Q! M and ßm are the interpolation weights for sub m. Waits for interpolation. { p'm (i)} they are subsequently converted into short-term predictor filter coefficients. { to me)} . The selection of interpolation weights is not as critical in this mode as it is in mode A. However, they have been determined using the same objective criteria as in A mode and tuned in detail by listening tests. The values of am and ßm that minimize the objective criteria Em can be shown to be YmC -XmB Otyy, = C2-AB XmC YmA am = C2-AB where A =? | | p. - p2 (J J = S < P-1, J "P2, J'P1, J" P2, JJ Xm = S < P-1, J "P2, J'Pm, J-P2.J> J Ym =? <Pm, J" P2, J'Pl, J-P2.J > J As before, p-1; J indicates the autocorrelation wait vector derived from the quantized filter coefficients of the second linear prediction analysis window of the Jl frame, ptJ indicates the autocorrelation wait vector derived from the filter coefficients quantized from the first linear prediction analysis window of the J frame, p2J indicates the autocorrelation wait vector derived from the quantized filter coefficients of the second linear prediction analysis window of the J frame, and pmJ indicates the wait vector of Real autocorrelation derived from voice samples in the frame subframe J. The search for the adaptive cipher code in mode B is similar to that of mode A in which the target vector for the search is derived in the same way and the measure The distortion used in the search is the same. However, there are some differences. Only all full spacing delays in the range [20,146] are searched and fractional delays are not searched. As in mode A, only positive correlations are considered in the search and the composite index is assigned by zeros corresponding to the compound vector of zeros if no positive correlations are found. The optimal adaptive encryption code index is encoded using 7 bits. The adaptive encrypted code gain, which is guaranteed to be positive, is quantified apart from the search cycle using a non-uniform 3-bit quantizer. This quantifier is different from the one used in mode A. As in mode A, the delayed decision is used so that the adaptive cipher search results in the two best spacing delay candidates in all sub-frames. In addition, in the two to five subframes, this has to be repeated for the two best target vectors produced by the two best sets of excitation model parameters derived for the previous subframes resulting in 4 sets of adaptive cipher code indices and gains associated with the end of the sub-frame. In each case, the target vector for the search for fixed encrypted code is derived by subtracting the graduated adaptive encrypted code vector from the target of the code vector Adaptive encryption. The fixed encryption code in mode B is an encrypted code of 9-bit multiple innovations with three sections. The first is a Hadamard vector summation section and the second and third sections are related to the generalized excitation pulse forms z. ^ N) and z ^ n), respectively. These forms of momentum have been defined above. The first section of this encrypted code and the associated search procedure is based on the publication of D.Lin "Ultra-Past CELP Coding Using Multi-Codebook Innovations", ICASSP92. We note that in this section, there are 256 vectors of innovation and the search procedure guarantees a positive gain. The second and third sections have 64 innovation vectors each and their search procedure can produce both positive and negative gains. One component of the multi-innovations cipher code is the deterministic vector sum code constructed from the Hadamard matrix H ^ j. The code vector of the vector sum code, as used in the present invention, is expressed as 4 ui =? ? im vm (n> 0 <i <15, m = l where the base vectors vm (n) are obtained from the rows of the Hadamard-Sylvester matrix and? im = + 1. The base vectors are selected based on a partition of sequences from the Hadamard matrix. The code vectors of the coded codes of Hadamard vector sum are values and binary code sequences valued. Compared to the algebraic codes that were previously considered, the Hadamard vector sum codes are constructed to have more ideal frequencies and phase characteristics. This is because the base vector partitioning scheme used in the present invention for the Hadamard matrix, which can be interpreted as a uniform sampling of the row vectors of the Hadamard matrix ordered in sequence. By contrast, non-uniform sampling methods have produced inferior results. The second section of the multi-innovations cipher code consists of the impulse form z.j (n = 63) and has 127 samples of length. The vector iés, mo of this section is simply the vector that starts from the iés input of this section. The third section consists of the impulse form and it has 127 samples long. Here again, the vector of this section is simply the vector that starts from the iés input of this section. Both the second and the third sections enjoy the advantages of an overlapping nature and Low density can be exploited using the search procedure as in mode A of the fixed encrypted code. As indicated above, the search procedure is not restricted to positive correlations and therefore, both positive and negative gains can result in the second and third sections. Once the optimum vector for each section has been selected, the magnitude of the gain of the encrypted code is quantified outside the search cycle by quantifying the ratio of the optimal correlation to the optimal energy by means of a 4-bit quantizer in all cases. the submarkets. This quantifier is different for the first section, while the second and third sections use a common quantifier. All quantifiers have zero gain as one of their inputs. The optimal distortion for each section is then calculated, and finally the optimal section is selected. The fixed cipher code index for each subframe is in the range of 0-255 if the optimal codebook vector is from the Hadamard section. If this one is from the section and the gain sign is positive, it is mapped in the 256-319 range. If it is from the z.j section (n = 63) and the gain sign is negative, it is mapped in the range 320-383. If this is from the section Zj (n = 63) and the sign of gain is positive, it is mapped to the range 384-447. If it goes from the zt section (n = 63) and the sign of the gain is negative, maps to the range 448-511. The resulting index can be encoded using 9 bits. The magnitude of the gain of the encrypted code is encoded using 4 bits in all sub-frames. For C mode, the frame is divided into five sub-frames as in mode B. Each sub-frame has a length of 8 milliseconds or 64 samples. The parameters of the excitation model in each sub-frame are the adaptive encrypted code index, the adaptive codebook gain, the fixed encrypted code index, and 2 fixed codebook gains, with an encrypted code gain associated with each half of the code. underwater. It is guaranteed that both are positive and therefore there is no sign information associated with them. As in both modes A and B, the best estimates of these parameters are determined using an analysis by the synthesis method in each sub-frame. The best overall estimate is determined at the end of the 40 millisecond frame using a delayed decision method identical to that used in modes A and B. The parameters of the short-term predictor or the linear prediction filter parameters are interpolated by subframe a subframe in the autocorrelation wait domain, in exactly the same way as in the B. However, the interpolation weights al2m and ßm are different than those used in mode B. They are obtained using the procedure described for mode B, but using various sources of background noise as training material. The search for adaptive cipher code in C mode is identical to that of mode B except that both positive and negative correlations are allowed in the search. The optimal adaptive encryption code index is encoded using 7 bits. The adaptive code gain, which could be positive or negative, is quantified outside the search cycle using a non-uniform 3-bit quantizer. This quantifier is different from that used in both mode A and mode B because it has a more restricted range and can also have negative values. By allowing both positive and negative correlations in the search cycle and having a quantizer with a restricted dynamic range, the periodic artifacts in the synthesized background noise due to the adaptive cipher code are considerably reduced. In fact, the adaptive code now behaves more like another fixed encrypted code. As in mode A and mode B, the delayed decision is used and the search for the adaptive cipher code produces the two best candidates in all sub-frames. Also, in sub-frames two to five, this is it has to repeat for the two meta vectors produced by the two best sets of excitation model parameters derived for the previous subframes resulting in 4 sets of adaptive cipher code indices and the gains associated with the end of the subframe. In each case, the target vector for the search of fixed encrypted code is derived by subtracting the graduated adaptive encrypted code vector from the target of the adaptive encrypted code vector. The fixed encryption code in C mode is an 8-bit multi-innovations cipher code and is identical to the Hadamard vector summation section in the fixed multi-innovations code of mode B. Here the same search procedure described is used. in D. Lin's publication "Ultra-Fast CELP Coding Using Multi-Codebook Innovations", ICASSP92. There are 256 codebook vectors and the search procedure guarantees a positive gain. The fixed codex code index is encoded using 8 bits. Once the optimal encrypted code vector has been selected, the optimal correlation and optimal energy for the first half of the sub-frame as well as for the second half of the sub-frame are calculated separately. The correlation rate with respect to the energy in both halves is quantified independently using a quantifier that has zero gain as one of its tickets . The use of 2 gains per sub ensures smooth reproduction of background noise. Due to the delayed decision, there are two sets of fixed encryption code indexes and optimal gains in the sub-frame one and four sets in sub-frames two to five. The delayed decision approach in C mode is identical to that used in the other A and B modes. Optimal parameters for each subframe are determined at the end of the 40 millisecond frame using an identical backward tracking procedure. The allocation of bits between various parameters is summarized in Figures 21A and 21B for Mode A, Figure 22 for Mode B, and Figure 23 for Mode C. These parameters are packaged by packing circuits 36 of the Figure 3. These parameters are packed in the same sequence as they are tabulated in these Figures. Thus, for mode A, using the same notation as in Figures 21A and 2IB, they are packed into a packet of 168 bits in size every 40 milliseconds in the following sequence: MODE1, LSP2, ACG1, ACG3, ACG4, ACG5, ACG7, ACG2, ACG2, ACG6, PITCH1, PITCH2, ACI1, SIGN1, FCG1, ACI2, SIGN2, FCG2, ACI3, SIGN3, FCG3, ACI4, SIGN4, FCG4, ACI5, SIGN5, FCG5, ACI6, SIGN6, FCG6, ACI7, SIGN7, FCG7, FCI12, FCI34, FCI56, and FCI7. For mode B, using the same notation as in Figures 21A and 21B, the parameters are packed into a 168-bit packet of size every 40 milliseconds in the following sequence: MODE1, LSP2, ACG1, ACG2, ACG3, ACG4, ACG5, ACI1, FCG1, FCI1, ACI2, FCG2, FCI2, ACI3, FCG3, FCI3, ACI4, FCG4, FCI4, FCI4, ACI5, FCG5, FCI5, LSP1, and MODE2. For mode C, using the same notation as in Figures 21A and 21B, these are packaged in a packet of 168 bits in size, every 40 milliseconds in the following sequence: M0DE1, LSP2, ACG1, ACG2, ACG3, ACG4, ACG5 , ACI1, FCG2_1, FCI1, ACI2, FCG2_2, FCI2, ACI3, FCG2_3, FCI3, ACI4, FCG2_4, FCI4, ACI5, FCG2_5, FCI5, FCG1_1, FCG1_2, FCG1_3, FCG1_4, FCG1_5, and MODE2. The packing sequence in all three modes is designed to reduce the sensitivity of an error in MODE1 and M0DE2 mode bits. The packing is done from the MSB or bit 7 to LSB or bit 0 from byte 1 to byte 21. MODE1 occupies the MSB or bit 7 or byte 1. By testing this bit, we can determine if the compressed voice belongs to the mode ao no . If this is not mode A, we test the M0DE2 that occupies the LSB or bit 0 or byte 21 to decide between the B mode and the C mode. The decoder 46 (Figure 4) is shown in the Figure 24 and receives the compressed voice bit stream in the same way as it exits by the speech encoder of Figure 3. The parameters are unpacked after determining whether the received mode bits indicate a first mode (Mode C), a second mode (Mode B), or a third mode (Mode A). These parameters are then used to synthesize the voice. The speech decoder 46 synthesizes the part of the signal corresponding to the frame, which depends on the second set of filter coefficients, independently of the first set of filter coefficients and of the first and second spacing estimates, when it is determined that the frame is of the first mode (mode C); synthesizes the part of the signal corresponding to the frame, depending on the first and second sets of filter coefficients, independently of the first and second spacing estimates, when it is determined that the frame is of the second mode (Mode B); and synthesizes a part of the signal corresponding to the frame, depending on the second set of filter coefficients and the first and second estimates, independently of the first set of filter coefficients, when it is determined that the frame is of the third mode (mode TO) . In addition, the speech decoder receives a badly based frame indicator of cyclic redundancy check (CRC) from a channel 45 decoder (Figure 1). This indicator flag of the bad frame is used to trigger the bad frame error masker and the error recovery sections (not shown) of the decoder. These can also be triggered by some schemes of detection of built-in errors. The speech decoder 46 tests the MSB or bit 7 of the byte 1 to see if the compressed speech packet corresponds to the A mode. Otherwise, the LSB or bit 0 of the byte 21 is tested to see if the packet corresponds to the B mode or to C mode. Once the correct mode of the received compressed voice packet is determined, the parameters of the received speech frame are unpacked and used to synthesize the speech. In addition, the speech decoder receives a badly based frame indicator of cyclic redundancy check (CRC) from the channel decoder 25 in Figure 1. This bad frame flag is used to trigger the frame masker bad and the error recovery sections of the voice decoder. These can also be triggered by some built-in error detection schemes. In mode A, the second set of indices of the received line spectral frequency is used to reconstruct the quantized filter coefficients that are then converted into autocorrelation waits. In each subframe, the autocorrelation waits are interpolated using the same weights as those used in the encoder for mode A and then converted into short-term predictor filter coefficients. Spacing indexes Open cycle values are converted to quantized open cycle spacing values. In each sub-frame, these open-cycle values are used in conjunction with each received 5-bit adaptive codebook index to determine the spacing delay candidate. The adaptive cipher code vector corresponding to this delay is determined from the adaptive cipher code 103 in Figure 24. The adaptive codebook gain index for each sub-frame is used to obtain the adaptive code gain, which is then applies to multiplier 104 to graduate the adaptive cipher code vector. The fixed encrypted code vector for each subframe is inferred from the fixed encrypted code 101 from the received fixed encrypted code index associated with that subframe and this is graduated by the gain of the fixed encrypted code, obtained from the gain index received fixed encrypted code and sign index for that sub-frame, by multiplier 102. Both the graduated adaptive coded code vector and the graduated fixed coded code vector are summed by adder 105 to produce an excitation signal that is improved by a spacing prefix 106 as described in LA Gerson and M.A. Jasuik, quoted above. This excitation signal is used to derive the short-term predictor 107 and the synthesized speech is further enhanced, subsequently, by a global zero-pole filter 109 with a spectral tilt correction and energy normalization. At the end of each subframe, the adaptive cipher code is updated by the excitation signal as indicated by the dotted line in Figure 25. In mode B, both sets of line spectral frequency indexes are used to reconstruct both the first as the second set of quantized filter coefficients that subsequently become autocorrelation waits. In each sub-frame, these autocorrelation waits are interpolated using exactly the same weights as those used in the encoder in B-mode and then converted into short-term predictor coefficients. In each subframe, the index of the received adaptive encrypted code is used to derive the adaptive encrypted code vector from the adaptive encrypted code 103 and the index of the received fixed encrypted code is used to derive the gain index of the fixed encrypted code, used in each sub-frame to recover the gain of the adaptive encrypted code and the gain of the fixed encrypted code. The excitation vector is reconstructed by graduating the adaptive cipher code vector by the gain of the adaptive cipher code using the multiplier 104, by graduating the fixed cipher code vector by the gain of the fixed cipher code using the multiplier 102, and summing them using the adder 105. As in mode A, this is improved by the spacing prefilter 106 prior to synthesis by the short-term predictor 107. The synthesized voice is further enhanced by the global zero-pole post filter 108. At the end of each subframe, the adaptive cipher code is updated by the excitation signal, as indicated by the dotted line in Figure 24. In c mode, the second set of received line spectral frequency indices is used to reconstruct quantified filter coefficients, which are then converted into autocorrelation waits. In each subframe, the autocorrelation waits are interpolated using the same weights as those used in the encoder for mode C and then converted into short-term predictive filter coefficients. In each subframe, the index of the received adaptive encrypted code is used to derive the adaptive encrypted code vector from the adaptive encrypted code 103 and the fixed encrypted code index is used to derive the fixed encrypted code vector from the encrypted code Fixed 101. The gain index of the adaptive encrypted code and the gain indices of the fixed encrypted code are used in each sub-frame to recover the gain of the adaptive encrypted code and the gains of the fixed encrypted code for both halves of the sub-frame. The excitation vector is reconstructed by graduating the adaptive codebook vector by adaptable codebook gain using the multiplier 104, by graduating the first half of the encrypted code vector by the first fixed codebook gain using the multiplier 102 and the second half of the fixed encrypted code vector by the second gain of the fixed coded code using the multiplier 102, and adding the vectors of the adaptive and fixed graduated coded codes using the adder 105. As in modes A and B, this is improved by the prefilter 106 before the synthesis by the short-term predictor 107. The synthesized voice is further improved by the subsequent zero-pole filter 108. The parameters of the spacing pre-filter and the global rear filter used in each mode are different and are custom-made for each mode, at the end of each sub-frame, the adaptive cipher code is updated by the excitation signal as indicated by the dotted line in Figure 24. As an alternative to the illustrated embodiment, the invention can be practiced with a shorter frame, such as a 22.5 millisecond frame, as shown in Figure 25. With that type of frame, it could be convenient to process only one linear prediction analysis window per frame, instead of the two linear prediction analysis windows illustrated. The analysis window could start after a duration Tb relative to the beginning of the current frame and extend into the next frame where the window would end after a duration Te relative to the beginning of the next market, where Te > Tb. In other words, the total duration of an analysis window could be longer than the duration of a framework, and two consecutive windows could, therefore, span a particular framework. Thus, a current frame could be analyzed by processing the analysis window for the current frame along with the analysis window for the previous frame. Thus, the preferred communication system detects when noise is the predominant component of a signal frame and encodes a predominant noise frame, as opposed to a predominant speech frame. This special coding for noise avoids some of the typical artifacts produced when the noise is encoded with an optimized voice scheme. This special coding allows an improved speech quality in a bit coding / decoding system at a low rate. The advantages and additional modifications can easily occur to those skilled in the art. Therefore, the invention in its broadest aspects, is not limited to the specific details, the representative apparatuses and the illustrative examples shown and described. Various modifications and variations may be made to the present invention without departing from the scope or spirit of the invention, and it is intended that the present invention covers the modifications and variations conditioned to fall within the scope of the appended claims and their equivalents.

Claims (35)

  1. NOVELTY OF THE INVENTION Having described the foregoing invention, it is considered as a novelty and, therefore, the content of the following is claimed as property. CLAIMS 1. A method for encoding a signal having a voice component, the signal being organized as a plurality of frames, which includes the steps of: measuring at least one voice characteristic of a frame, wherein the features include spectrum, spacing stationarity, high frequency content and energy; determine at least two flags for each measured voice characteristic, to indicate a degree of that characteristic; determine if the frame contains a low voice component based on the determined flags; classify the frame in a voiceless mode, depending on whether the frame has a low voice component, and in a voice mode, otherwise; and generating a coded frame in accordance with a voiceless coding scheme when the frame is classified in voiceless mode, and in accordance with a speech coding scheme when the frame is classified in speech mode.
  2. 2. The coding method, in accordance with the claim in claim 1, wherein the The measured voice characteristic is energy, and the flag determining the step further includes the steps of comparing the measured energy with at least two thresholds, which include a high threshold representing a high energy value and a low threshold representing a value of low energy; put a first energy flag if the measured energy exceeds the high threshold; and putting a second energy flag if the measured energy falls below the low threshold, where it is determined that the frame contains a low voice component if the second flag is set, and it is determined that it does not contain a low voice component if the first energy flag is placed.
  3. 3. The coding method, in accordance with claim 2, further including the step of updating at least one of the thresholds if the frame is classified in the voiceless mode.
  4. 4. The coding method, according to claim 2, wherein a second measured voice characteristic is the stationarity of the spectrum, and the flag determining the step further includes: comparing the measured energy with at least two thresholds intermediates representing energy values that fall between the high energy value and the low energy value, the first intermediate threshold representing a higher energy value than the energy value represented by the second intermediate threshold; put a third energy flag if the measured energy falls below the first intermediate threshold; put a fourth energy flag if the measured energy falls below the second intermediate threshold; measure a stationarity of the spectrum for the frame; put a first flag of stationarity of the spectrum if the measurement of stationarity of the spectrum strongly indicates stationarity of the spectrum; put a second flag of stationarity of the spectrum if the measurement of stationarity of the spectrum indicates weakly stationarity of the spectrum; where it is determined that the frame contains a low voice component if: the first flag of stationarity of the spectrum is placed and the third energy flag is set; or the second flag of stationarity of the spectrum is placed and the fourth energy flag is set.
  5. 5. The coding method, according to claim 4, wherein the step of measuring a stationarity of the frame spectrum further includes the steps of: determining a first set of filter coefficients corresponding to the frame and a second set of filter coefficients corresponding to a previous frame; and determine a cepstral distortion and a residual energy for the frame based on the first and second sets of filter coefficients, where the The measurement of the stationarity of the spectrum is based on the cepstral distortion and the determinations of the residual energy.
  6. The coding method, according to claim 1, wherein a first measured voice characteristic is the stationarity of the spectrum, a second measured voice characteristic is the spacing stationarity, and a third measured voice characteristic. it is a high frequency content, where the flag that determines the next step includes the steps of: measuring a stationarity of the spectrum for the frame; put a first flag of stationarity of the spectrum if the measurement of the stationarity of the spectrum strongly indicates stationarity of the spectrum; put a second flag of stationarity of the spectrum if the measurement of stationarity of the spectrum indicates weakly stationarity of the spectrum; measuring a spacing stationarity for the frame, - setting a first spacing stationarity flag if the spacing stationarity measurement strongly indicates spacing stationarity; setting a second spacing stationarity flag if the spacing stationarity measurement indicates weakly spacing stationarity; measure a high frequency content of the frame; put a first flag of high frequency if the measurement of high frequency indicates strongly a high frequency content; and put a second high frequency flag if the high frequency measurement indicates a lack of high frequency content.
  7. 7. The coding method, according to claim 6, wherein it is determined that the frame contains a low voice component if the first stationarity flag of the spectrum is set and the first space stationing flag is not setting and the first high frequency flag is set.
  8. 8. The coding method, according to claim 6, wherein it is determined that the frame has a low voice component if the second flag of stationarity of the spectrum is set, the first space stationing flag is not put, the second spacing flag of spacing r 'is not set, and the first high frequency flag is set.
  9. 9. An encoder for encoding a signal having a speech component, the signal being organized as a plurality of frames, including: an element for measuring at least one voice characteristic of a frame, wherein features include stationarity of the spectrum , spacing stationarity, high frequency content, and energy and to determine at least two flags for each measured voice characteristic, to indicate a degree of that characteristic; an element to determine if the frame contains a low voice component based on an evaluation of the determined flags; a mode classifier for classifying the frame in a voiceless mode if the frame contains a low voice component, and in the opposite case, in a voice mode; and a frame encoder for generating a frame encoded in accordance with a voiceless coding scheme when the frame is classified in voiceless mode. and in accordance with a voice coding scheme when the frame is classified in voice mode.
  10. The encoder, according to claim 11, wherein a first measured characteristic is the energy and the measuring elements further include: an energy meter for comparing the measured energy with at least two thresholds, including a threshold high representing a high energy value and a low threshold representing a low energy value, putting a first energy flag if the measured energy exceeds the high threshold, and putting a second energy flag if the measured energy falls below the low threshold, wherein it is determined that the frame contains a low voice component if the second energy flag is set, and it is determined that it does not contain a low voice component if the first energy flag is set.
  11. 11. The encoder, according to claim 10 further including a controller to update at least one of the thresholds if the frame is classified in the voiceless mode.
  12. 12. The encoder, according to claim 10, which further includes: a meter of the stationarity of the spectrum to measure a __ stationarity of the spectrum for the frame, putting a first flag of stationarity of the spectrum if the measurement of the stationarity of the spectrum strongly indicates stationarity of the spectrum, and putting a second flag of stationarity of the spectrum if the measurement of stationarity of the spectrum weakly indicates the spectrum stationary, wherein the energy meter further compares the measured energy with at least two intermediate thresholds representing the energy values that fall between the high energy value and the low energy value, the first intermediate threshold representing a value of energy higher than the energy value represented by the second intermediate threshold, and where it is determined that the frame contains a low voice component if: the first flag of stationarity of the spectrum is set and the third energy flag is set; or the second flag of stationarity of the spectrum is set and the fourth energy flag is set.
  13. 13. The encoder, according to claim 12, wherein the spectrum stationary meter determines a first set of filter coefficients corresponding to the frame and a second set of filter coefficients corresponding to a frame of one previous signal, and determines a cepstral distortion and a residual energy for the frame based on the first and the second sets of determined filter coefficients, where the measurement of stationarity of the spectrum is based on the determinations of the cepstral distortion and the residual energy .
  14. 14. The encoder, according to claim 9, wherein a first measured energy is the stationarity of the spectrum, a second measured characteristic is the spacing stationarity, and a third measured characteristic is the high frequency content, where the measurement elements also include: a spectrum stationary meter to measure a stationarity of the spectrum for the frame, putting a first flag of stationarity of the spectrum if the measurement of stationarity of the spectrum strongly indicates the stationarity of the spectrum, and placing a second flag of spectrum stationarity if the measurement of stationarity of the spectrum weakly indicates the stationarity of the spectrum; a meter of the spacing stationarity to measure a spacing stationarity for the frame, by putting a first spacing stationarity flag if the spacing stationarity measurement strongly indicates spacing stationarity, and placing a second stationarity spacing flag if the stationarity measurement of spacing spacing indicates weakly spacing stationary; a high frequency content meter to measure a high frequency content of the frame, placing a first high frequency flag if the high frequency measurement strongly indicates a high frequency content, and placing a second high frequency flag if the measurement of High frequency indicates a lack of high frequency content.
  15. 15. A method for encoding a signal having a speech component, the signal being organized as a plurality of frames, the method including two steps, executed for each frame, of: analyzing a first linear prediction window to generate a first set of filter coefficients for a frame; analyze a second linear prediction window to generate a first estimate of spacing for the frame; analyze a first spacing analysis window to generate a first spacing estimate for the frame; analyze a second spacing analysis window to generate a second spacing estimation for the frame; determine whether the framework is one of the first mode, a second mode and a third mode, depending on the energy context measurements of the framework and the content of the frame spectrum; coding the frame, depending on the second set of filter coefficients and the first and second spacing estimates, independently of the first set of filter coefficients, when determining that the frame is of the third mode; encode the frame, depending on the first and second sets of filter coefficients, independently of the first and second spacing estimates, when it is determined that the frame is of the second mode; and coding the frame, which depends on the second set of filter coefficients, independently of the first set of filter coefficients and of the first and second spacing estimates, when it is determined that the frame is of the first mode.
  16. The method, according to claim 15, wherein the determination step further depends on the measures of spacing stationarity between the frame and a previous frame.
  17. 17. The method, according to claim 15, wherein the determination step further depends on the short-term level gradient measurements within the frame.
  18. 18. The method, according to claim 15, wherein the determination step further depends on the measurements of a zero crossing index within the frame.
  19. 19. The coding method, according to claim 15, wherein the first linear prediction window is contained within the frame and the second linear prediction window begins during the frame and extends into the next frame.
  20. 20. The coding method, according to claim 15, wherein the first spacing estimation window begins during the frame and extends into the next frame.
  21. 21. The coding method, according to claim 15, wherein a frame determined to be of a third mode contains a signal with a voice component composed primarily of voice.
  22. 22. The coding method, according to claim 15, wherein a frame determined to be of a second mode contains a signal with a voice component composed primarily of nonvocal voice.
  23. 23. The coding method, according to claim 15, wherein a frame determined to be of a first mode contains a signal with a low voice component.
  24. 24. An encoder for encoding a signal having a speech component, the signal being organized as a plurality of frames, including: a filter coefficient generator for analyzing a first linear prediction window to generate a first set of coefficients of filter for a frame and to analyze a second linear prediction window to generate a second set of filter coefficients for the frame; a spacing estimator to analyze a first window of spacing analysis to generate a first spacing estimate for the frame; and analyzing a second spacing analysis window to generate a second spacing estimate for the frame; a mode determiner for determining whether the frame is one of a first mode, a second mode and a third mode, depending on the energy content measurements of the frame; and a frame encoder for encoding the frame that depends on the determined mode of the frame, wherein a frame that is determined to be of a third mode is encoded depending on the second set of filter coefficients and the first and second spacing estimates, independently of the first set of filter coefficients, a frame determined to be of a second mode is coded depending on the first and second sets of filter coefficients, independently of the first and second estimates of spacing, and a frame determined to be of a first mode is coded depending on the second set of filter coefficients, independently of the first set of filter coefficients and of the first and second spacing estimates .
  25. 25. The encoder, according to claim 24, wherein the mode determiner determines the mode depending on a particular mode of a prior frame.
  26. 26. The encoder, according to claim 24, wherein the mode determiner determines that the frame is of the first mode only when the determined mode of a previous frame is either the first mode or the second mode.
  27. 27. The encoder, according to claim 24, wherein the mode determiner determines that the frame is of the third mode only when the determined mode of a previous frame is either the third mode or the second mode.
  28. 28. The encoder, according to claim 24, wherein the mode determiner further depends on the measures of spacing stationarity between the frame and a previous frame.
  29. 29. The encoder, in accordance with claimed in claim 24, wherein the mode determiner further depends on the short-term level gradient measurements within the frame.
  30. 30. The encoder, in accordance with claim 24, wherein the mode determiner further depends on the measures of the zero crossing index within the frame.
  31. 31. The encoder, according to claim 24, wherein the first linear prediction window is contained within the frame and the second linear prediction window begins during the frame and extends into the next frame.
  32. 32. The encoder, according to claim 24, wherein the first spacing estimation window is contained within the frame and the second spacing estimation window begins during the frame and extends into the next frame.
  33. 33. The encoder, according to claim 24, wherein a frame that is determined to be of a third mode contains a signal with a voice component composed primarily of voice.
  34. 34. The encoder, according to claim 24, wherein a frame that is determined to be of a second mode contains a signal with a voice component composed mainly of non-vocal voice.
  35. 35. The encoder, according to claim 24, wherein a frame determined to be of a second mode contains a signal with a low voice component.
MXPA/A/1996/000061A 1994-04-18 1996-01-03 Method for coding a signal that contains MXPA96000061A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US08229271 1994-04-18
US08/229,271 US5734789A (en) 1992-06-01 1994-04-18 Voiced, unvoiced or noise modes in a CELP vocoder

Publications (2)

Publication Number Publication Date
MX9600061A MX9600061A (en) 1998-11-29
MXPA96000061A true MXPA96000061A (en) 1999-01-15

Family

ID=

Similar Documents

Publication Publication Date Title
EP0704088B1 (en) Method of encoding a signal containing speech
US5495555A (en) High quality low bit rate celp-based speech codec
US5751903A (en) Low rate multi-mode CELP codec that encodes line SPECTRAL frequencies utilizing an offset
EP0422232B1 (en) Voice encoder
Spanias Speech coding: A tutorial review
US5127053A (en) Low-complexity method for improving the performance of autocorrelation-based pitch detectors
CA2140329C (en) Decomposition in noise and periodic signal waveforms in waveform interpolation
KR100487136B1 (en) Voice decoding method and apparatus
CA2722196C (en) A method for speech coding, method for speech decoding and their apparatuses
USRE38269E1 (en) Enhancement of speech coding in background noise for low-rate speech coder
KR20020052191A (en) Variable bit-rate celp coding of speech with phonetic classification
CA2016462A1 (en) Hybrid switched multi-pulse/stochastic speech coding technique
JPH08179796A (en) Voice coding method
JPH04270398A (en) Voice encoding system
Chamberlain A 600 bps MELP vocoder for use on HF channels
MXPA01003150A (en) Method for quantizing speech coder parameters.
JP3335841B2 (en) Signal encoding device
Yang Low bit rate speech coding
Özaydın et al. Matrix quantization and mixed excitation based linear predictive speech coding at very low bit rates
Zhang et al. A CELP variable rate speech codec with low average rate
MXPA96000061A (en) Method for coding a signal that contains
Rebolledo et al. A multirate voice digitizer based upon vector quantization
Ozaydin et al. A 1200 bps speech coder with LSF matrix quantization
Drygajilo Speech Coding Techniques and Standards
Miki et al. Pitch synchronous innovation code excited linear prediction (PSI‐CELP)