CN101496099A - Systems, methods, and apparatus for wideband encoding and decoding of active frames - Google Patents

Systems, methods, and apparatus for wideband encoding and decoding of active frames Download PDF

Info

Publication number
CN101496099A
CN101496099A CNA2007800280941A CN200780028094A CN101496099A CN 101496099 A CN101496099 A CN 101496099A CN A2007800280941 A CNA2007800280941 A CN A2007800280941A CN 200780028094 A CN200780028094 A CN 200780028094A CN 101496099 A CN101496099 A CN 101496099A
Authority
CN
China
Prior art keywords
frame
voice
frequency band
description
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2007800280941A
Other languages
Chinese (zh)
Other versions
CN101496099B (en
Inventor
阿南塔帕德马那伯罕·A·坎达哈达伊
维韦克·拉金德朗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/830,842 external-priority patent/US8532984B2/en
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Publication of CN101496099A publication Critical patent/CN101496099A/en
Application granted granted Critical
Publication of CN101496099B publication Critical patent/CN101496099B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Telephonic Communication Services (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

Applications of dim-and-burst techniques to coding of wideband speech signals are described. Reconstruction of a highband portion of a frame of a wideband speech signal using information from a previous frame is also described.

Description

Be used for valid frame is carried out system, the method and apparatus of wideband encoding and decoding
Related application
The application's case is advocated application on July 31st, 2006 and is entitled as the right of priority of the 60/834th, No. 683 U.S. Provisional Patent Application case of " being used for the fuzzy of 4GV broadband and burst signaling (DIMAND BURST SIGNALLING FOR 4GV WIDEBAND) ".The application's case is also with on July 30th, 2007 application and to be entitled as the 11/830th, No. 842 U.S. patent application case (attorney docket 061658) of " system, the method and apparatus (SYSTEMS; METHODS; AND APPARATUS FORWIDEBAND ENCODING AND DECODING OF INACTIVE FRAMES) that are used for invalid frame is carried out wideband encoding and decoding " relevant.
Technical field
The present invention relates to processing to voice signal.
Background technology
The speech transmissions of being undertaken by digital technology has become comparatively general, especially in digital radio phones such as long-distance telephone, for example IP speech packet switch phones such as (also be called VoIP, wherein IP represent Internet Protocol) and for example cellular phone.This spreads feasible the generation reducing the concern in order to transmit the quantity of information of Speech Communication via transmission channel and to keep the perceived quality of reconstruct voice simultaneously rapidly.
Be configured to that the device of compressed voice is called as " sound encoding device " by extracting the parameter relevant with human speech generation model.Sound encoding device (also being called audio coder ﹠ decoder (codec) or vocoder) generally includes speech coder and Voice decoder.The voice signal that speech coder will import into the usually digital signal of audio-frequency information (expression) is divided into the time slice that is called " frame ", analyzes each frame to extract some correlation parameter and to be encoded frame with described parameter quantification.Encoded frame is transferred to the receiver that comprises Voice decoder via transmission channel (that is, wired or wireless network connects).Voice decoder receives and handles encoded frame, it is carried out de-quantization with the generation parameter, and use and come the reconstructed speech frame through the parameter of de-quantization.
The frame that contains voice (" valid frame ") that speech coder is configured to distinguish voice signal is usually mourned in silence with only containing of voice signal or the frame (" invalid frame ") of ground unrest.Speech coder can be configured to use different coding pattern and/or speed to come effective and invalid frame are encoded.For instance, speech coder is configured to use the comparison valid frame few position, employed position of encoding to come invalid frame is encoded usually.Sound encoding device can use than low bitrate and/or uses different bit rate for dissimilar valid frames for invalid frame, transmits to carry out voice signal than the harmonic(-)mean bit rate supporting, wherein exists few or has no the perception loss of quality.
Aspect bandwidth, will be limited to the frequency range of 300 to 3400 kilo hertzs (kHz) traditionally via the Speech Communication of public exchanging telephone network (PSTN).More recently the network that is used for Speech Communication (for example using the network of cellular phone and/or VoIP) may there is no identical bandwidth constraints, and may need to use the equipment of this type of network to have the ability of transmitting and receiving the Speech Communication that comprises wideband frequency range.For instance, may need this kind equipment support to extend downwardly into 50Hz and/or extend up to 7 or the audio frequency range of 8kHz.Also may need this kind equipment to support other application, for example high quality audio or audio/video conference, to transmission of for example multimedia service such as music and/or TV or the like, described application may have the audio speech content in the scope beyond the traditional PSTN boundary.
The extension of the scope that sound encoding device is supported in upper frequency can improve sharpness.For instance, for example distinguishing in the voice signal, fricative information spinner such as " s " and " f " will be in the upper frequency.High-band extends other quality that also can improve through decodeing speech signal, for example sense of reality.For instance, in addition sound vowel also may have spectrum energy far above the PSTN frequency range.
Summary of the invention
Method according to a kind of processes voice signals of configuration comprises that first valid frame based on described voice signal produces first voice packet, and described first voice packet comprises that the part that comprises first valid frame to voice signal is in (A) first frequency band and (B) description of the spectrum envelope on second frequency band that extends above first frequency band.The method comprises also based on second valid frame of described voice signal and produces second voice packet that described second voice packet comprises the description to the spectrum envelope of the part that comprises second valid frame on first frequency band of voice signal.In the method, second voice packet does not comprise the description to the spectrum envelope on second frequency band.
Speech coder according to another configuration comprises packet encoder and frame formatter.Described packet encoder is configured to produce first voice packet based on first valid frame of voice signal and in response to first state of rate controlled signal, and described first voice packet comprises the description at (1) first frequency band and (2) spectrum envelope on second frequency band that extends above first frequency band.Described packet encoder also is configured to produce second voice packet based on second valid frame of voice signal and in response to second state that is different from first state of rate controlled signal, and described second voice packet comprises the description to the spectrum envelope on first frequency band.Described frame formatter is through arranging to receive first and second voice packets.Frame formatter is configured to produce the first encoded frame in response to first state of obfuscation control signal, and the described first encoded frame contains first voice packet.Frame formatter also is configured to produce the second encoded frame in response to second state that is different from first state of obfuscation control signal, and the described second encoded frame contains the burst of second voice packet and the information signal that separates with voice signal.In this scrambler, the first and second encoded frames have equal length, first voice packet occupies eight ten at least percent of the described first encoded frame, and what second voice packet occupied the second encoded frame is no more than half, and second valid frame occurs follow first valid frame closely in voice signal after.
Comprise based on obtaining first frame to voice signal from the information from first voice packet of encoded voice signal at (A) first frequency band with (B) be different from the description of the spectrum envelope on second frequency band of first frequency band according to the method for the processed voice bag of another configuration.The method also comprise based on from one from the information of second voice packet of encoded voice signal and obtain description to the spectrum envelope of second frame on first frequency band of voice signal.The method also comprises based on from the information of first voice packet and obtain description to the spectrum envelope of second frame on second frequency band.The method also comprises based on from the information of second voice packet and obtain the information relevant with the tonal components at first frequency band of second frame.
Voice decoder according to another configuration is configured to calculate through the decoded speech signal based on encoded voice signal.This Voice decoder comprises steering logic and packet decoder.Described steering logic is configured to produce the control signal that comprises value sequence, and described sequence is based on the code index from the voice packet of encoded voice signal, and each value in the described sequence is corresponding to the frame period through decodeing speech signal.Described packet decoder is configured in response to the value with first state of described control signal based on corresponding to decoded frame to calculating in the description of (1) first frequency band and (2) spectrum envelope on second frequency band that extends above first frequency band through decoded frame, and described description is based on from the information from the voice packet of encoded voice signal.Packet decoder also be configured in response to having of described control signal be different from described first state second state value and calculate corresponding through decoded frame based on following description: (1) is to the description through the spectrum envelope of decoded frame on first frequency band, described description is based on from the information from the voice packet of described encoded voice signal, and (2) to the description through the spectrum envelope of decoded frame on second frequency band, and described description is based on the information of coming to come across in the comfortable described encoded voice signal at least one voice packet before the described voice packet.
Description of drawings
Fig. 1 shows the figure of the radio telephone system that connects with PSTN Jie.
Fig. 2 shows the figure of the radio telephone system that connects with the Internet Jie.
Fig. 3 shows two block diagrams that speech coders/decoders is right.
Fig. 4 shows that speech coder or voice coding method can be used to select the example of the decision tree of bit rate.
Fig. 5 A shows can be in order to the curve map of the trapezoidal function of windowing of calculated gains shape value.
Fig. 5 B shows that the function of windowing with Fig. 6 A is applied to each in five subframes of a frame.
Fig. 6 A show to divide the band scrambler to can be used to an example of non-overlapped frequency band scheme that the broadband voice content is encoded.
Fig. 6 B show to divide the band scrambler to can be used to an example of overlapping bands scheme that the broadband voice content is encoded.
Fig. 7 A shows three kinds of different-formats of 192 encoded frames to 7C.
Fig. 8 A is the process flow diagram according to the method M100 of common configuration.
Fig. 8 B is the process flow diagram of the embodiment M110 of method M100.
Fig. 9 illustrate using method M100 embodiment and to two operations that the continuous effective frame is encoded of voice signal.
The task T110 of Figure 10 illustration method M100 and the operation of T120.
The operation of the task T112 of Figure 11 illustration method M110 and the embodiment of task T120.
Figure 12 is a table of showing spendable one group of four different encoding schemes of speech coder of the embodiment that is configured to manner of execution M100.
Figure 13 is a table of describing the position distribution of 171 bit wide band FCELP bag.
Figure 14 is a table of describing the position distribution of 80 arrowband HCELP bags.
Figure 15 A shows the block diagram according to the speech coder 100 of common configuration.
Figure 15 B shows the block diagram of the embodiment 122 of packet encoder 120.
Figure 15 C displaying spectrum envelope is described the block diagram of the embodiment 142 of counter 140.
Figure 16 A shows the block diagram of the embodiment 124 of packet encoder 122.
Figure 16 B displaying temporal information is described the block diagram of the embodiment 154 of counter 152.
Figure 17 A shows the block diagram of the embodiment 102 of speech coder 100, and described embodiment 102 is configured to according to minute band encoding scheme wideband speech signal be encoded.
Figure 17 B shows the block diagram of the embodiment 128 of packet encoder 126.
Figure 18 A shows the block diagram of the embodiment 129 of packet encoder 126.
Figure 18 B displaying time is described the block diagram of the embodiment 158 of counter 156.
Figure 19 A shows the process flow diagram according to the method M200 of common configuration.
The process flow diagram of the embodiment M220 of Figure 19 B methods of exhibiting M200.
The process flow diagram of the embodiment M230 of Figure 19 C methods of exhibiting M200.
The application of Figure 20 methods of exhibiting M200.
Relation between Figure 21 illustration method M100 and the M200.
The application of the embodiment M210 of Figure 22 methods of exhibiting M200.
The application of Figure 23 methods of exhibiting M220.
The application of Figure 24 methods of exhibiting M230.
The application of the embodiment M240 of Figure 25 methods of exhibiting M200.
Figure 26 A shows the block diagram according to the Voice decoder 200 of common configuration.
Figure 26 B shows the block diagram of the embodiment 202 of Voice decoder 200.
Figure 26 C shows the block diagram of the embodiment 204 of Voice decoder 200.
Figure 27 A shows the block diagram of the embodiment 232 of first module 230.
Figure 27 B displaying spectrum envelope is described the block diagram of the embodiment 272 of demoder 270.
Figure 28 A shows the block diagram of the embodiment 242 of second module 240.
Figure 28 B shows the block diagram of the embodiment 244 of second module 240.
Figure 28 C shows the block diagram of the embodiment 246 of second module 242.
Described graphic and enclose and describe, same reference numbers refers to identical or similar elements or signal.
Embodiment
Hereinafter described configuration resides in and is configured to adopt in the mobile phone communication system of CDMA air interface.Yet, be understood by those skilled in the art that, having the method and apparatus of feature as described herein can reside in in the various communication systems that adopt the known broad range of techniques of those skilled in the art any one, for example on wired and/or wireless (for example, CDMA, TDMA, FDMA and/or TD-SCDMA) transmission channel, adopt the system of IP speech (VoIP).Expection and disclosing thus clearly, this type of configuration applicable to the network of packet switch (for example, through arrange with according to agreement such as for example VoIP and the wired and/or wireless network of carrying speech transmissions) and/or Circuit-switched network in.
Configuration described herein can be applicable to the wideband speech coding system to support the obfuscation to valid frame.For instance, this type of configuration can be used to support will blur with burst technique and is used for transmitting signaling and/or secondary service information in the wideband speech coding system.
Unless be subjected to context limited clearly, otherwise term " calculating " is in this article in order to indicating any one in its ordinary meaning, for example computing, assessment, generation and/or from a class value, select.Unless be subjected to context limited clearly, otherwise term " acquisition " for example calculates, derives, receives (for example, from external device (ED)) and/or retrieval (for example, from memory element array) in order to indicate any one in its ordinary meaning.Use in current description and claims under the situation that term " comprises ", it does not get rid of other element or operation.Term " A is based on B " is in order to indicating any one in its ordinary meaning, comprising following situation: (i) " A is at least based on B " and (ii) " A equals B " (if being fit in specific context).
Unless indication is arranged in addition, otherwise any disclosure to speech coder with special characteristic also is intended to disclose the voice coding method (vice versa) with similar characteristics clearly, and any disclosure according to the speech coder of customized configuration also is intended to disclose voice coding method (vice versa) according to similar configuration clearly.Unless indication is arranged in addition, otherwise any disclosure to Voice decoder with special characteristic also is intended to disclose the tone decoding method (vice versa) with similar characteristics clearly, and any disclosure according to the Voice decoder of customized configuration also is intended to disclose tone decoding method (vice versa) according to similar configuration clearly.
As illustrated in fig. 1, the cdma wireless telephone system generally includes a plurality of mobile subscribers unit 10, it is configured to communicate with wireless mode with radio access network, and described radio access network comprises a plurality of base stations 12 and one or more base station controllers (BSC) 14.This system also comprises the mobile switching centre (MSC) 16 of being coupled to BSC 14 usually, and it is configured so that radio access network is situated between with conventional public exchanging telephone network (PSTN) 18 and connects (may via media gateway).BSC 14 is coupled to base station 12 via back haul link.Described back haul link can be configured to support to comprise any one in some known interface of (for example) E1/T1, ATM, IP, PPP, frame relay, HDSL, ADSL or xDSL.
Each base station 12 advantageously comprises at least one sector (not shown), and each sector comprises omnidirectional antenna or points to radially antenna away from the specific direction of base station 12.Perhaps, each sector can comprise two antennas to carry out diversity reception.Each base station 12 can be advantageously through designing to support a plurality of frequency assignings.The common factor of sector and frequency assigning can be described as CDMA Channel.Base station 12 also can be described as base station transceiver subsystem (BTS) 12.Perhaps, industry can use " base station " to be referred to as BSC 14 and one or more BTS 12.BTS 12 also can be expressed as " cell site " 12.Perhaps, each sector of given BTS 12 can be described as cell site.Mobile subscriber unit 10 is honeycomb fashion or pcs telephone 10 normally.This system according to one or more versions of IS-95 standard (for example can be configured to, as by Arlington, Virginia telecommunications industry association (Telecommunications Industry Alliance, Arlington, VA) Fa Bu IS-95, IS-95A, IS-95B, cdma2000) and use.
During the typical operation of cellular telephone system, base station 12 is 10 reception array reverse link signal from array mobile subscriber unit.Mobile subscriber unit 10 is just carrying out call or other communication.Handle in described base station 12 by each reverse link signal that given base station 12 receives.The gained data forwarding is arrived BSC 14.BSC 14 provide call resources to distribute and mobile management functional, comprise control to the soft handover between the base station 12.BSC 14 also is routed to MSC 16 with the data that received, and described MSC 16 provides extra route service to connect to be used for being situated between with PSTN 18.Similarly, PSTN 18 is situated between with MSC 16 and connects, and MSC 16 is situated between with BSC 14 and connects, and BSC 14 controls base station 12 again the array forward link signal is transferred to array mobile subscriber unit 10.
The element of cellular telephone system as shown in Figure 1 also can be configured to the support package exchange data traffic.As shown in Figure 2, usually use the packet data serving node (PDSN) that is coupled to the gateway router that is connected to packet data network and in the mobile subscriber unit 10 with external packets data network public networks such as (for example, for example) the Internets between the route packet data services.PDSN routes data to one or more Packet Control Functions (PCF) again, described PCF each serve one or more BSC and serve as packet data network and radio access network between link.This system can be configured between the mobile subscriber unit on the different radio access network call or other communication are being carried out carrying (for example, via one or more agreements such as for example VoIP) as packet data services under the situation that enters PSTN never.
Fig. 3 A shows the first speech coder 30a, and it is through arranging to receive through digitized voice signal s 1(n) and to described signal encode with on communication channel 50 (for example, via transmission medium) be transferred to the first Voice decoder 40a.The first Voice decoder 40a is through arranging encoded voice signal is decoded and synthesize output voice signal s and synthesize 1(n).Fig. 3 B shows the second speech coder 30b, and it is through arranging with to through digitized voice signal s 2(n) encode with on communication channel 60 (for example, via identical or different transmission medium) be transferred to the second Voice decoder 40b in the opposite direction.Voice decoder 40b is through arranging so that this encoded voice signal is decoded, thereby produces synthetic output voice signal s Synthetic 2(n).The first speech coder 30a and the second Voice decoder 40b are (similarly, the second speech coder 30b and the first Voice decoder 40a) can use at any communicator that is used for transmitting together with received speech signal, described communicator comprises subscriber unit, BTS or the BSC that (for example) above describes referring to Fig. 1 and 2.
Voice signal s 1(n) and s 2(n) expression according in the whole bag of tricks known in this technology (for example pulse code modulation (PCM), compression expansion μ rule or A rule) any one and through digitizing and quantized analog signal.As known in the art, the numeral sample of speech coder received speech signal is as the frame of input data, and wherein each frame comprises the sample of predetermined number.The frame of voice signal is enough short usually so that can expect that the spectrum envelope of described signal keeps static relatively on entire frame.A typical frame length is 20 milliseconds, but can use any frame length that is regarded as being fit to application-specific.20 milliseconds frame length for the sampling rate of 7 kilo hertzs (kHz) corresponding to 140 samples, for the sampling rate of 8kHz corresponding to 160 samples, and for the sampling rate of 16kHz,, but can use any sampling rate that is regarded as being fit to application-specific corresponding to 320 samples.Another example that can be used for the sampling rate of voice coding is 12.8kHz, and other example is included in 12.8kHz to other interior speed of the scope of 38.4kHz.
Usually, all frames of voice signal all have equal length, and suppose consistent frame length in particular instance described herein.Yet also expection and announcement clearly thus can be used inconsistent frame length.In some applications, frame is non-overlapped, and in other is used, uses the overlapping frame scheme.For instance, sound encoding device uses non-overlapped frame scheme usually in scrambler place use overlapping frame scheme and at the demoder place.Scrambler also might use the different frame scheme to different task.For instance, speech coder or voice coding method can use an overlapping frame scheme to encode to the description of the spectrum envelope of frame and use different overlapping frame schemes to encode to the description of the temporal information of frame.
May need to be configured to use different bit rate to come valid frame and invalid frame are encoded speech coder.Also may need speech coder to use different bit rate to come dissimilar valid frames is encoded.In the case, optionally the frame that contains less relatively voice messaging is adopted than low bitrate.Usually the example in order to bit rate that valid frame is encoded comprises 171 positions of every frame, 80 positions of every frame and 40 positions of every frame; And usually the example in order to bit rate that invalid frame is encoded comprises 16 positions of every frame.(especially be obedient to as by (the Telecommunications Industry Association of Arlington, Virginia telecommunications industry association at cellular telephone system, Arlington, VA) the temporary standard (IS)-95 of Gong Buing or the system of similar industrial standard) situation in, these four bit rate also are called " full rate ", " half rate ", " 1/4th speed " and " 1/8th speed ".
May need in the valid frame of voice signal each is categorized as some one in dissimilar.These dissimilar frames that comprise frame, transition frames (for example, the beginning of expression speech or the frame of end) and the unvoiced speech (for example, the voice of expression grating) of speech sound (for example, the voice of expression vowel sound).May need to be configured to use the different coding pattern to come dissimilar speech frames is encoded speech coder.For instance, the frame of speech sound tends to have for a long time (promptly, continue an above frame period) and the periodic structure relevant with tone, and by using coding that the coding mode of the description of this long-term spectrum signature is encoded usually comparatively effective to sound frame (or sound frame sequence).The example of this type of coding mode comprises code exciting lnear predict (CELP) and prototype pitch period (PPP).On the other hand, silent frame and invalid frame be usually without any significant long-term spectrum signature, and speech coder can be configured to by using the coding mode of not attempting describing this feature to come these frames are encoded.Noise Excitation linear prediction (NELP) is an example of this coding mode.
Speech coder or voice coding method can be configured to select in the various combination of bit rate and coding mode (also being called " encoding scheme ").For instance, speech coder can be configured to frame that contains speech sound and transition frames are used full rate CELP scheme, the frame that contains unvoiced speech is used half rate NELP scheme, and invalid frame is used 1/8th rate N ELP schemes.Perhaps, this speech coder can be configured to the frame that contains speech sound is used full rate PPP scheme.
Speech coder also can be configured to support to be used for a plurality of code rates of one or more encoding schemes, for example full rate and half rate CELP scheme and/or full rate and 1/4th speed PPP schemes.For instance, the frame in the series that comprises the period of stablizing speech sound tends to a great extent redundant, and making can be less than the speed of full rate to wherein at least some are encoded and do not have the remarkable loss of consciousness quality.
Multi-scheme sound encoding device (comprising the sound encoding device of supporting a plurality of code rates and/or coding mode) provides the efficient voice coding to carry out than low bitrate usually.The skilled craftsman will recognize that the number that increases encoding scheme will allow to have big dirigibility when selecting encoding scheme, and this can cause lower average bit rate.Yet the increase of encoding scheme number will increase the complicacy in the total system accordingly.Any particular combinations of employed available solutions in the fixed system of giving will be specified by free system resources and concrete signal environment.The example of multi-scheme coding techniques be entitled as (for example) " variable rate speech coding (VARIABLE RATE SPEECH CODING) " the 6th, 691, No. 084 United States Patent (USP) is described with the 11/625th, No. 788 U.S. patent application case (graceful Zhu's nanotesla people such as (Manjunath)) that is entitled as " arbitrary average that is used for variable speed coding device is according to speed (ARBITRARY AVERAGE DATARATES FOR VARIABLE RATE CODERS) " is middle.
The multi-scheme speech coder generally includes to be checked the input speech frame and makes open circuit decision-making module about the decision-making of described frame being used which encoding scheme.It is effective or invalid that this module is configured to usually with frame classification, and also can be configured to valid frame is categorized as one in two or more dissimilar (for example sound, noiseless or transition).Frame classification can be based on one or more features of present frame and/or one or more previous frames, frame energy, signal to noise ratio (snr), periodicity and zero crossing rate in each in for example total frame energy, two or more different frequency bands.This classification can comprise the value of this factor or value and threshold value compares and/or the value and the threshold value of the change of this factor compared.
Fig. 4 shows that the open circuit decision-making module sound-type that can be used to contain according to particular frame selects the example of the decision tree of bit rate that described frame is encoded.In other cases, the selected bit rate of particular frame also be can be depending on required pattern on series of frames of for example required average bit rate, bit rate (its can in order to support required average bit rate) and/or to criterions such as the selected bit rate of previous frame.
The multi-scheme speech coder also can be carried out closed circuit coding decision-making, wherein obtains one or more measurements to coding efficiency after encoding wholly or in part by the bit rate of using open circuit to select.The performance measurement that can consider in closed circuit test comprises that SNR prediction, the predicated error in the encoding scheme such as (for example) SNR, for example PPP sound encoding device quantizes SNR, phase quantization SNR, amplitude quantization SNR, consciousness SNR and as the standardization crosscorrelation between current and past frame to the measurement of stability.If performance measurement is lower than threshold value, then code rate and/or pattern can be changed into expection and can provide one of better quality.Can be in order to the example of the closed circuit classification schemes of the quality of keeping the variable rate speech coding device in the 09/191st of being entitled as of application on November 13rd, 1998 " closed circuit variable bit rate multi-mode prediction sound encoding device (CLOSED-LOOP VARIABLE-RATE MULTIMODE PREDICTIVE SPEECHCODER) ", No. 643 U. S. application cases and the 6th, describe in 330, No. 532 United States Patent (USP)s.
Speech coder is configured to the frame of voice signal is encoded to voice packet usually, and wherein the size of voice packet and form are corresponding to the selected specific coding scheme of described frame.Voice packet contains the speech parameter set usually, can be from the corresponding frame of described parameter reconstruct voice signal.The set of this speech parameter generally includes spectrum information, for example to the description of the energy distribution on a frequency spectrum in the described frame.This energy distribution also is called frame " frequency envelope " or " spectrum envelope ".Description to the spectrum envelope of frame can be according to having multi-form and/or length in order to the specific coding scheme that corresponding frame is encoded.
Speech coder is configured to usually with the ordered sequence to description value of being calculated as of the spectrum envelope of frame.In some cases, speech coder is configured to calculate ordered sequence, makes each value indicative signal at the respective frequencies place or amplitude on corresponding spectral regions or value.An ordered sequence that example is a fourier transform coefficient of this description.
In other cases, speech coder is configured to the description to spectrum envelope is calculated as the ordered sequence (set of the coefficient value that for example linear predictive coding (LPC) is analyzed) of the parameter value of encoding model.Usually the ordered sequence with the LPC coefficient value is arranged to one or more vectors, and speech coder can be through implementing so that these values are calculated as filter factor or reflection coefficient.The number of the coefficient value in the described set also is called lpc analysis " rank ", and as the example on the typical rank of the lpc analysis carried out by the speech coder of communicator (for example cellular phone) comprise 4,6,8,10,12,16,20,24,28 and 32.
Speech coder is configured usually to transmit the description (for example, as one or more index that enter in corresponding look-up table or " code book ") to spectrum envelope on transmission channel with quantized versions.Therefore, the set that may need speech coder to calculate the LPC coefficient value that adopts the form that can effectively quantize, for example line spectrum pair (LSP), line spectral frequencies (LSF), adpedance are composed the set to the value of (ISP), adpedance spectral frequency (ISF), cepstrum coefficient or log area ratio.Speech coder also can be configured in conversion and/or before quantizing the ordered sequence of value be carried out other operation, for example perceptual weighting.
In some cases, the description of the spectrum envelope of frame is also comprised description (for example, adopting the form of the ordered sequence of fourier transform coefficient) to the temporal information of frame.In other cases, the set of the speech parameter of voice packet also can comprise the description to the temporal information of frame.The form of the description of temporal information be can be depending on specific coding pattern in order to frame is encoded.For some coding modes (for example), can comprise the description of temporal information and to treat the description that is used for encouraging the pumping signal of LPC model (for example, as being defined) by Voice decoder by description to spectrum envelope for the CELP coding mode.Description to pumping signal comes across (for example, as one or more index that enter in the corresponding code book) in the voice packet with quantized versions usually.Description to temporal information also can comprise the information relevant with at least one tonal components of pumping signal.For the PPP coding mode, for instance, encoded temporal information can comprise treats the description of prototype that is used for reproducing the tonal components of pumping signal by Voice decoder.Description to the information relevant with tonal components comes across (for example, as one or more index that enter in the corresponding code book) in the voice packet with quantized versions usually.
For other coding mode (for example), can comprise the description of temporal envelope (" energy envelope " or " gain envelope " that also be called frame) to the description of temporal information to frame for the NELP coding mode.Can comprise value to the description of temporal envelope based on the average energy of frame.This value usually is applied to the yield value of described frame through presenting as during waiting decoding, and also is called " gain framework ".In some cases, the gain framework is based on the normalization factor of following ratio between the two: (A) ENERGY E of primitive frame OriginalAnd (B) from the ENERGY E of the synthetic frame of other parameter of voice packet (for example, comprise spectrum envelope description) SyntheticFor instance, the gain framework can be expressed as E Original/ E SyntheticOr be expressed as E Original/ E SyntheticSquare root.The others of gain framework and temporal envelope are described in No. 2006/0282262 U.S. Patent Application Publication case (Butterworth people such as (Vos)) of disclosed being entitled as in (for example) on Dec 14th, 2006 " system, the method and apparatus (SYSTEMS; METHODS, ANDAPPARATUS FOR GAIN FACTOR ATTENUATION) that are used for quantization of spectral envelope representation " in more detail.
Alternatively or extraly, can comprise each relative energy value in many subframes of described frame to the description of temporal envelope.This type of value usually is applied to the yield value of corresponding subframe through presenting as during waiting decoding, and is referred to as " gain profile " or " gain shape ".In some cases, the gain shape value is each normalization factor based on following ratio between the two: (A) ENERGY E of original subframe i Original, iAnd (B) from the ENERGY E of the corresponding subframe i of the synthetic frame of other parameter of encoded frame (for example, comprise spectrum envelope description) Synthetic, iIn the case, can use ENERGY E Synthetic, iMake ENERGY E Original, iStandardization.For instance, the gain shape value can be expressed as E Original, i/ E Synthetic, iOr be expressed as E Former Beginning, i/ E Synthetic, iSquare root.An example to the description of temporal envelope comprises gain framework and gain shape, and wherein gain shape comprises each the value in five 4 milliseconds of subframes of 20 milliseconds of frames.Can on linear scale or logarithm (for example, decibel) scale, express yield value.This category feature is described in No. 2006/0282262 U.S. Patent Application Publication case that (for example) above quotes in more detail.
In the value (or value of gain shape) of calculated gains framework, may need to use and the overlapping function of windowing of contiguous frames (or subframe).The yield value of Chan Shenging is applied to the Voice decoder place in the mode of overlap-add usually in this way, and this can help to reduce or be avoided uncontinuity between frame or the subframe.Fig. 5 A shows can be in order to each the curve map of the trapezoidal function of windowing in the calculated gains shape value.In this example, each overlapping 1 millisecond in window and two adjacent sub-frames.Fig. 5 B shows this function of windowing is applied in five subframes of 20 milliseconds of frames each.Other example of function of windowing comprises the function that has negative lap period not and/or can be symmetry or asymmetric different window shape (for example, rectangle or Hamming).Also might be by different subframes being used the different functions and/or have the value that different value on the subframe of different length comes the calculated gains shape of windowing by the calculated gains shape.
Comprise that the voice packet to the description of temporal envelope generally includes the description of adopting quantized versions, for example enter one or more index in the corresponding code book, but in some cases, can use an algorithm to come under the situation of not using code book and/or gain shape quantizes and/or de-quantization to the gain framework.An example to the description of temporal envelope comprises the quantization index with eight to 12 positions, and it specifies five gain shape values (for example, in five continuous subframes each being specified a gain shape value) to frame.This describes also can comprise another quantization index of frame being specified gain framework value.
As mentioned above, may need to transmit and receive voice signal with the frequency range that surpasses 300 to 3400kHz PSTN frequency range.A kind of is that the frequency range of whole extension is encoded as single frequency band in order to this signal is carried out Methods for Coding.The method can be passed through bi-directional scaling narrowband speech coding techniques (for example, be configured to technology that for example 0 to 4kHz or 300 to 3400Hz PSTN quality frequency range is encoded) and implement for example to cover 0 to 8kHz wideband frequency range.For instance, the method can comprise that (A) takes a sample comprising high-frequency component to voice signal with higher rate, and (B) the arrowband coding techniques is reconfigured with this broadband signal of expression on required degree of accuracy.A kind of these class methods that reconfigure the arrowband coding techniques are to use the lpc analysis of higher-order (that is, generation has more many-valued coefficient vector).The wideband speech coding device that broadband signal is encoded as single frequency band also is called " full band " code device.
May need to implement the wideband speech coding device so that can need not encoded signal is deciphered or in other mode it significantly revised by at least one arrowband part of the encoded signal of narrow band channel (for example PSTN channel) transmission.This feature can promote and the compatibility backward of only approving the network and/or the equipment of narrow band signal.Also may need to implement to use the wideband speech coding device of different coding pattern and/or speed for the different frequency bands of voice signal.This feature can be in order to code efficiency and/or the consciousness quality of supporting to improve.The wideband speech coding device that is configured to produce the voice packet of the part (for example, independent speech parameter set, the different frequency bands of each set expression wideband speech signal) with different frequency bands of representing wideband speech signal also is called " divide and be with " code device.
Fig. 6 A shows an example of non-overlapped frequency band scheme, and it can be used for encoding to the broadband voice content of the scope of 8kHz to crossing over 0Hz by a minute band speech coder.This scheme comprises from 0Hz and extends to first frequency band (also being called the arrowband scope) of 4kHz and extend to second frequency band (also being called extension, top or high-band scope) of 8kHz from 4kHz.Fig. 6 B shows an example of overlapping bands scheme, and it can be used for encoding to the broadband voice content of the scope of 7kHz to crossing over 0Hz by a minute band speech coder.This scheme comprises from 0Hz and extends to first frequency band (arrowband scope) of 4kHz and extend to second frequency band (extension, top or high-band scope) of 7kHz from 3.5kHz.
Other example of frequency band scheme comprises that the arrowband scope only extends downwardly into the example of about 300Hz.This scheme can comprise that also covering is from about 0Hz or 50Hz another frequency band up to the low strap scope of about 300Hz or 350Hz.Divide a particular instance of band speech coder to be configured to the arrowband scope is carried out ten rank lpc analysis and the high-band scope is carried out six rank lpc analysis.
The voice packet that uses full band encoding scheme to encode contains the description to the single spectrum envelope that extends on whole wideband frequency range, and the voice packet that uses branch band encoding scheme to encode has two or more unitary part of the information in the different frequency bands (for example, arrowband scope and high-band scope) of representing wideband speech signal.For instance, usually, each in minute these unitary part of the voice packet of band coding contains the description to the spectrum envelope on corresponding frequency band of voice signal.Can contain a description at the temporal information of whole wideband frequency range to frame through the voice packet of minute band coding, perhaps each in minute unitary part of the voice packet of band coding can contain the description at the temporal information of corresponding frequency band to voice signal.
Speech coder is configured to produce a series of encoded frames usually, and each encoded frame comprises voice packet and (possibly) one or more positions that are associated.Fig. 7 A illustrates an example of the form of the encoded frame with 192 bit lengths.In this example, encoded frame comprises 171 full-speed voice bags of the frame of expression voice signal (that is main business).Encoded frame also can comprise one or more check bit.In this example, encoded frame comprises: 12 frame quality indicator F, and tool can comprise parity check bit or Cyclic Redundancy Check position; And 8 set of tail position T, it can produce the convolutional code of CRC position in order to termination and initialization.Encoded frame also can comprise one or more positions of the existence of the data (for example, information burst) of indication except that voice packet.In this example, encoded frame comprises mixed mode position MM, and it is in the case through zero clearing (that is, having null value).
May need once in a while or periodically in encoded frame, to comprise be not the information of the part of voice signal.For instance, may need the burst of encoded frame carrying signaling information between another entity (for example BTS, BSC, MSC, PCF or PDSN) in transfer table and network.But signaling information burst carrying is to the request of carrying out an action (for example improving through-put power or measurement parameter (for example, pilot frequency intensity)) or at least a portion of this request responding (for example, measure parameter value).The signaling information burst relevant with handover in the radio access network or the handover from a radio access network to another person can comprise the network information through upgrading, for example value of network identifier (NID), system identifier (SID) and/or bag area identifiers (PZID).In some cases, the signaling information burst comprises at least a portion of service parameters message in the one or more system that contains in these handover parameter values.
Perhaps, may need the burst of encoded frame carrying secondary service.The secondary service burst can comprise updated information once in a while, for example at least a portion of geographical location information (for example, GPS or GPS information) renewal.In another case, the secondary service burst can comprise at least a portion of low bitrate data transmission (for example page message, short message transmission service (SMS) message or email message).
In some cases, may need speech coder that encoded frame is configured so that some positions can be used for the carrying out of Memory.For instance, may need speech coder by using than described frame being encoded to less voice packet by the low bit rate of the indicated bit rate of rate selection mechanism.This operation is called " obfuscation " or " source class obfuscation ".In a representative instance of source class obfuscation, force speech coder to use the half rate scheme to come frame (originally having selected the full rate scheme at described frame) is encoded, but can comprising any speed usually, the source class obfuscation reduces.Variable rate speech coder can be configured to carry out fuzzy and burst technique produces the encoded frame that comprises through the burst of fuzzy voice packet and out of Memory.Description to this type of technology can be found in (for example) the 5th, 504, No. 773 United States Patent (USP)s (Padua Buddhist nun people such as (Padovani)).
Whether it comprises one or more positions of signaling information or secondary service to use the fuzzy encoded frame that produces with burst technique can comprise indication.Fig. 7 B shows that fuzzy and burst technique can be used to comprise the form of the encoded frame that the half-rate speech bag (80 positions) of main business and 86 s' signaling information happens suddenly.This frame comprises the burst format position BF that indicates whether to use fuzzy and burst or blank-and-burst form, type of service position TT that whether the indication burst contains signaling traffic or secondary service and can be used for main business and/or be used for signaling or two business model position TM of the position of the different numbers of secondary service in order to indication that all institute's rhemes are in the case all through zero clearing.Frame also comprises message start bit SOM, and whether its indication position subsequently is first of signaling message.Fig. 7 C shows that fuzzy and burst technique can be used to comprise the form of encoded frame of the half rate bag of voice signal and 87 s' secondary service burst.In the case, frame format does not comprise the message start bit, and type of service position TT is through setting.
The excessive use of obfuscation may cause the degradation of the quality of encoded voice signal.In general, the use of obfuscation is limited to and is no more than 5 percent of full-rate vocoding, but more generally one of the percentage that is no more than this type of frame or (possibly) 2 percent is carried out obfuscation.In some cases, speech coder is configured to select to treat through fuzzy frame according to the binary mask file that wherein each of mask file indicates described frame whether to treat through fuzzy corresponding to the state of a frame and institute's rheme.In other cases, speech coder is configured to avoiding obfuscation by half rate frame by the time till scheduling under the possible situation.
May need to implement the wideband encoding system as upgrading to existing arrowband coded system.For instance, may need wherein to support extra wideband encoding scheme by the extra packet form by using the big or small change that minimizes network of identical bit rate and bag.Use is enhanced variable rate codec version B (EVRC-B) as Fig. 7 A to the narrowband speech codec of a kind of existing type of the frame format of being obedient to IS-95 shown in Fig. 7 C, as being in described in third generation partner program 2 (3GPP2) the document C.S0014-B v1.0 (in May, 2006) that line obtains at 3gpp2.org.May need and to support the system upgrade of EVRC-B for going back support of enhanced variable-rate codec version C (EVRC-C, also be called EVRC-WB), as also being in described in the 3GPP2 document C.S0014-Cv1.0 (in January, 2007) that line obtains at 3gpp2.org.
As mentioned above, the use with burst technique is blured in existing arrowband coded system support.May in the wideband encoding system, support fuzzy and burst technique.A kind of relating in order to the method for the broadband frame being carried out obfuscation, designed and implement to come with using through fuzzy frame than low bitrate (for example, half rate) wideband encoding scheme.Wideband acoustic encoder can be configured to according to this scheme the fuzzy frame of warp be encoded, or alternatively by using selected the voice packet of creating the form with this scheme with the voice packet of high bit rate broadband encoding scheme coding.Yet in either case, design has than low bitrate wideband encoding scheme that acceptable consciousness quality all will be for costliness.Implement the more resource that this encoding scheme also may consume speech coder, for example cycle of treatment and storage space.Implement extra encoding scheme and also will increase system complexity.
Another kind is to use than low bitrate arrowband encoding scheme to come encoding through fuzzy broadband frame in order to the method for the broadband frame being carried out obfuscation.Though the method relates to the loss of high-band information, may be more or less freely in broadband enforcement during upgrading to existing arrowband facility, because it can be configured to use existing arrowband encoding scheme (for example, half rate CELP).Corresponding Voice decoder can be configured to the high-band information of losing from the high-band signal reconstruct of one or more previous frames.
Fig. 8 A shows that described method comprises task T110, T120, T130 and T140 according to the process flow diagram of the method M100 of common configuration.Task T110 is configured to produce first voice packet based on first valid frame of voice signal.Described first voice packet comprises in (A) first frequency band and (B) description of the spectrum envelope on second frequency band that extends above first frequency band.This description can be the single description of extending on described two frequency bands, perhaps its can comprise each in described frequency band corresponding one on the independent description of extending.Task T110 also can be configured to first voice packet is produced as the description that contains the temporal envelope on first and second frequency bands.This description can be the single description of extending on described two frequency bands, perhaps tool can comprise each in described frequency band corresponding one on the independent description of extending.Notice that clearly the scope of the embodiment of method M100 comprises that also task T110 is configured to based on the invalid frame of voice signal and produces the embodiment of first voice packet.
Task T120 based on second valid frame that comes across first valid frame voice signal afterwards in voice signal (for example is configured to, follow the valid frame after first valid frame closely, or separate the valid frame of one or more other valid frames with first valid frame) and produce second voice packet.Described second voice packet comprises the description to the spectrum envelope on first frequency band.Task T120 also can be configured to second voice packet is produced as the description that contains at the temporal information of first frequency band.Task T130 is configured to produce the first encoded frame that contains first voice packet, and task T140 is configured to produce the second encoded frame of the burst that contains second voice packet and the information signal that separates with voice signal.First and second voice packets also can comprise the description to temporal information based on respective frame.The application of Fig. 9 illustration method M100.
Task T130 and T140 are configured to the first and second encoded frames are produced as has identical size (for example, 192 positions).Task T110 can be configured to first voice packet is produced as the length that has greater than half of the length of the first encoded frame.For instance, task T110 can be configured to first voice packet is produced as six ten at least percent, 70,75,80 or 85 length with the length that is the first encoded frame.In specific this type of example, task T110 is configured to first voice packet is produced as the length with 171 positions.Perhaps, task T110 can be configured to first voice packet is produced as 50,45 or 42 percent length with length of being no more than the first encoded frame.In specific this type of example, task T110 is configured to first voice packet is produced as the length with 80 positions.
Task T120 is configured to second voice packet is produced as 60 percent length with length of being not more than the second encoded frame.For instance, task T120 can be configured to second voice packet is produced as 50,45 or 42 percent length with length of being no more than the second encoded frame.In a particular instance, task T120 is configured to second voice packet is produced as the length with 80 positions.Task T120 also can be configured to make second voice packet not comprise to the description of the spectrum envelope on second frequency band and/or to the description at the temporal information of second frequency band.
Method M100 is usually through being implemented as the part of big voice coding method, and expection and disclose speech coder and the voice coding method that is configured to manner of execution M100 thus clearly.This scrambler or method can be configured to use the form identical with the first encoded frame or use the form identical with the second encoded frame to (for example following second frame valid frame afterwards in the voice signal, follow the valid frame after second frame closely, or separate the valid frame of one or more other valid frames with second frame) encode.Perhaps, this scrambler or method can be configured to use different encoding schemes to encode to following second frame noiseless or invalid frame afterwards.Corresponding Voice decoder can be configured to use and replenish coming to come across the encoded voice signal of leisure the decoding of the valid frame of the first encoded frame another encoded frame afterwards from the information of the first encoded frame decoding.Other place in this describes, the method that discloses Voice decoder and the frame of voice signal is decoded, it uses the information from the first encoded frame decoding in follow-up valid frame is decoded to one or more.
One or both among task T110 and the T120 can be configured to calculate the corresponding description to spectrum envelope.Figure 10 shows the application of subtask T112 of this embodiment of task T110, and described subtask T112 is configured to calculate based on first frame description to the spectrum envelope on first and second frequency bands.Figure 10 also shows the application of subtask T122 of this embodiment of task T120, and described subtask T122 is configured to calculate based on second frame description to the spectrum envelope on first frequency band.Task T110 and T120 also can be configured to calculate based on respective frame the description to temporal information, and described description can be included in the corresponding voice packet.
Task T110 and T120 can be configured to make that second voice packet comprises the description to the spectrum envelope on first frequency band, and the length of wherein said description is not less than half of length of included description to the spectrum envelope on first and second frequency bands in first voice packet.For instance, task T110 and T120 can be configured to make length in second voice packet to the description of the spectrum envelope on first frequency band be at least included description to the spectrum envelope on first and second frequency bands in first voice packet length 55 or 60 percent.In a particular instance, the length to the description of the spectrum envelope on first frequency band in second voice packet is 22 positions, and the length of included description to the spectrum envelope on first and second frequency bands in first voice packet is 36 positions.
Second frequency band is different from first frequency band, but method M110 can be configured to make described two band overlappings.The example of the lower limit of first frequency band comprises 0,50,100,300 and 500Hz, and the example of the upper limit of first frequency band comprises 3,3.5,4,4.5 and 5kHz.The example of the lower limit of second frequency band comprises 2.5,3,3.5,4 and 4.5kHz, and the example of the upper limit of second frequency band comprises 7,7.5,8 and 8.5kHz.Expection and disclose all 500 of above-mentioned boundary thus and may make up clearly, and also expection and disclose the application of arbitrary this type of combination thus clearly to arbitrary embodiment of method M110.In a particular instance, first frequency band comprises the scope of about 50Hz to about 4kHz, and second frequency band comprises the scope of about 4Hz to about 7kHz.In another particular instance, first frequency band comprises the scope of about 100Hz to about 4kHz, and second frequency band comprises the scope of about 3.5Hz to about 7kHz.In another particular instance, first frequency band comprises the scope of about 300Hz to about 4kHz, and second frequency band comprises the scope of about 3.5Hz to about 7kHz.In these examples, it is positive and negative 5 percent that term " about " is indicated, and wherein the boundary of each frequency band is indicated by corresponding 3dB point.
As mentioned above, for broadband application, divide the band encoding scheme to have to be better than entirely advantage, for example the code efficiency of Ti Gaoing and to the support of compatibility backward with encoding scheme.May need method M100 is embodied as by use dividing the band encoding scheme but not be with encoding scheme to produce the first encoded frame entirely.The process flow diagram of the embodiment M110 of Fig. 8 B methods of exhibiting M100, described embodiment M110 comprises the embodiment T114 of task T110.As the embodiment of task T110, task T114 is configured to produce first voice packet that comprises the description of the spectrum envelope on first and second frequency bands.In the case, task T114 is configured to first voice packet is produced as and comprises in the description of the spectrum envelope on first frequency band with to the description of the spectrum envelope on second frequency band, makes described two descriptions (though located adjacent one another in voice packet possibly) separated from one another.
Task T114 can be configured to use branch band encoding scheme to calculate description to spectrum envelope.Figure 11 shows the application of subtask T116 of this embodiment of task T114, and wherein subtask T116 is the branch band embodiment of subtask T112.Subtask T116 comprises subtask T118a, and it is configured to calculate based on first frame description to the spectrum envelope on first frequency band.Subtask T116 also comprises subtask T118b, and it is configured to calculate based on first frame description to the spectrum envelope on second frequency band.Task T118a and T118b also can be configured to calculate the independent description to the temporal information on described two frequency bands.
Calculating to the description of the frequency spectrum of frame and/or temporal information can be based on the information from one or more previous frames.In the case, use the arrowband encoding scheme to come second frame encoded and to reduce the coding efficiency of one or more subsequent frames.Task T120 can comprise subtask T124 (not shown), and it is configured to calculate based on second frame to the description of the spectrum envelope on second frequency band and/or to the description at the temporal information of second frequency band.For instance, task T120 can be configured to use the wideband encoding scheme to come second frame is encoded.As mentioned above, task T120 can be configured to make second voice packet not comprise to the description of the spectrum envelope on second frequency band or to the description at the temporal information of second frequency band.Yet, even in the case, calculate at this information of second frequency band so that its be used at the scrambler place on the basis of this historical information to one or more subsequent frames encode still can be under providing on those frames than the situation that is not having this information to its encode will be good the consciousness quality.Perhaps, task T120 can be configured to use the arrowband encoding scheme come to first frequency band of second frame encode and the history of second frequency band of initialization next frame (for example, by reset storage in the past frequency spectrum and/or the storer of temporal information).In another replacement scheme, task T120 is configured to use the arrowband encoding scheme to come first frequency band of second frame is encoded and used to wipe and handle routine and estimate description (and/or to the description at the temporal information of second frequency band) to the spectrum envelope on second frequency band at second frame.For instance, this embodiment of task T120 can be configured to based on estimating description (and/or to description at the temporal information of second frequency band) to spectrum envelope on second frequency band from the information of one or more previous frames at second frame from first frame and (possibly).
Task T118a and T118b can be configured to calculate the description to the spectrum envelope on described two frequency bands with equal length, and perhaps one among task T118a and the T118b can be configured to calculate the description longer than the description of being calculated by another task.For instance, task T118a and T118b can be configured to make as by the length in first voice packet that task T118b calculates in the description of the spectrum envelope on second frequency band be no more than as calculate by task T118a in first voice packet to 50,40 or 30 percent of the length of the description of the spectrum envelope on first frequency band.In a particular instance, the length to the description of the spectrum envelope on first frequency band in first voice packet is 28 positions, and the length to the description of the spectrum envelope on second frequency band in first voice packet is 8 positions.Task T118a and T118b also can be configured to calculate to the independent description at the temporal information of described two frequency bands.
Task T118a and T122 can be configured to calculate the description to the spectrum envelope on first frequency band with equal length, and perhaps one among task T118a and the T122 can be configured to calculate the description longer than the description of being calculated by another task.For instance, task T118a and T122 can be configured to make as by the length in second voice packet that task T122 calculates in the description of the spectrum envelope on first frequency band be at least as calculate by task T118a in first voice packet to 50,60,70 or 75 percent of the length of the description of the spectrum envelope on first frequency band.In a particular instance, the length to the description of the spectrum envelope on first frequency band in first voice packet is 28 positions, and the length to the description of the spectrum envelope on first frequency band in second voice packet is 22 positions.
The table of Figure 13 shows that speech coder can be used to carry out one group of four different encoding schemes of the voice coding method of the embodiment that comprises method M100.In this example, use full rate broadband CELP encoding scheme (" encoding scheme 1 ") to come sound frame is encoded.This encoding scheme uses 153 positions to come the arrowband part of frame is encoded and used 16 positions that high band portion is encoded.For the arrowband, encode to the description (for example, be encoded to one or more and quantize the LSP vector) of spectrum envelope and use 125 positions to encode to the description of pumping signal in 28 positions of encoding scheme 1 use.For high-band, encoding scheme 1 is used 8 positions to come code frequency spectrum envelope (for example, be encoded to one or more and quantize the LSP vector) and is used 8 positions to encode to the description of temporal envelope.
May need encoding scheme 1 is configured to derive the high-band pumping signal from the arrowband pumping signal, making does not need any position of encoded frame to come carrying high-band pumping signal.Also may need with encoding scheme 1 be configured to calculate with as from the relevant high-band temporal envelope of temporal envelope of the synthetic high band signal of other parameter of encoded frame (for example, comprise the spectrum envelope on second frequency band description).This category feature is described in No. 2006/0282262 U.S. Patent Application Publication case that (for example) above quotes in more detail.
In example, use half rate arrowband CELP encoding scheme (" encoding scheme 2 ") to come to encoding through fuzzy frame according to the table of Figure 12.This encoding scheme uses 80 positions to come the arrowband part of frame encode (and not using any position that high band portion is encoded).Encode to the description (for example, be encoded to one or more and quantize the LSP vector) of spectrum envelope and use 58 positions to encode to the description of pumping signal in 22 positions of encoding scheme 2 uses.
Compare with the speech sound signal, it is important information for speech understanding that unvoiced sound signal contains more usually in high-band.Therefore, compare with the high band portion of sound frame is encoded, may need to use than multidigit and come the high band portion of silent frame is encoded, even also be like this for the situation of using higher overall bit rate that sound frame is encoded.In example, use half rate broadband NELP encoding scheme (" encoding scheme 3 ") to come silent frame is encoded according to the table of Figure 12.Replace being used for 16 positions that the high band portion of sound frame is encoded as encoding scheme 1, this encoding scheme uses 27 positions to come the high band portion of described frame is encoded: 12 positions in order to coding to the description of spectrum envelope (for example, be encoded to one or more and quantize the LSP vector) and 15 positions in order to the description (for example, be encoded to gain framework and/or the gain shape of quantification) of coding to temporal envelope.For the arrowband part is encoded, encoding scheme 3 is used 47 positions: 28 positions in order to coding to the description of spectrum envelope (for example, be encoded to one or more and quantize the LSP vector) and 19 positions in order to the description (for example, be encoded to gain framework and/or the gain shape of quantification) of coding to temporal envelope.
In example according to the table of Figure 12, use 1/8th rate narrowband NELP encoding schemes (" encoding scheme 4 ") invalid frame to be encoded with the speed of 16 of every frames, wherein 10 positions in order to coding to the description of spectrum envelope (for example, be encoded to one or more and quantize the LSP vector) and 5 positions in order to the description (for example, be encoded to gain framework and/or the gain shape of quantification) of coding to temporal envelope.Encode to the description of spectrum envelope and use 6 positions to encode to the description of temporal envelope in another 8 positions of example use of encoding scheme 4.
In the example according to Fig. 12, encoding scheme 2 and/or encoding scheme 4 can be the encoding scheme of leaving over from basic arrowband facility.This speech coder or voice coding method also can be configured to support other to leave over encoding scheme and/or new encoding scheme.The table of Figure 13 is showed as is distributed set by the position at full rate bag (171 positions) that example produced of broadband CELP encoding scheme 1.The table of Figure 14 is showed as is distributed set by the position at half rate bag (80 positions) that example produced of arrowband CELP encoding scheme 2.The particular instance of task T110 uses full rate CELP encoding scheme (for example, according to the encoding scheme in the table of Figure 12 1) to come to produce first voice packet based on the sound or transition frames of voice signal.Another particular instance of task T110 uses half rate NELP encoding scheme (for example, according to the encoding scheme in the table of Figure 12 3) to come to produce first voice packet based on the silent frame of voice signal.The another particular instance of task T110 uses 1/8th rate N ELP encoding schemes (for example, according to the encoding scheme in the table of Figure 12 4) to come to produce first voice packet based on the invalid frame of voice signal.
In the typical case of the embodiment of method M100 uses, the array of logic element (for example, logic gate) be configured to carry out in the various tasks of described method one, one or more or even whole.One or more (may be whole) in the described task also can be through (for example being embodied as code, one or more instruction sets), it is can be by comprising that logic element (for example, processor, microprocessor, microcontroller or other finite state machine) array machine (for example, computing machine) embodies in the computer program that reads and/or carry out (for example, for example dish, quickflashing or one or more data storage mediums such as other Nonvolatile memory card, semiconductor memory chips).The task of the embodiment of method M100 also can be carried out by this type of array or machine more than one.In these or other embodiment, described task can be used for the device of radio communication (for example cellular phone) or have execution in other device of this communication capacity.This device can be configured to communicate (for example, using for example one or more agreements of VoIP) with the network of circuit switching and/or packet switch.For instance, this device can comprise the RF circuit that is configured to transmit encoded frame.
In the situation of broadband, use the another method of fuzzy and burst technique to be to use high band portion to come the burst of carrying information through fuzzy bag.In the case, (for example can revise high bit speed, full rate) the wideband encoding scheme is so that each voice packet that it produced includes through keeping the position as the mixed mode designator, and speech coder can be configured to set the mixed mode position and indicates the high band portion of voice packet to contain signaling information or secondary service but not common high-band voice messaging.
Figure 15 A shows the block diagram according to the speech coder 100 of common configuration.Speech coder 100 comprises through arranging the packet encoder 120 with the frame of received speech signal and rate controlled signal.Packet encoder 120 is configured to produce voice packet according to the speed by the indication of rate controlled signal.Speech coder 100 also comprises frame formatter 130, and it is through arranging to receive voice packet, information burst and obfuscation control signal.Frame formatter 130 is configured to produce encoded frame according to the state of obfuscation control signal.The communicator (for example cellular phone) that comprises speech coder 100 can be configured to be transferred to encoded frame wired, wireless or the light transmission channel in before it carried out further handle operation, for example error correction and/or redundancy encoding.
In this example, speech coder 100 is from another module receiving velocity control signal.Also speech coder 100 can be embodied as and comprise the rate selection module that is configured to produce rate controlled signal (for example, according to open circuit as indicated above or switching road rate selection algorithm).In the case, the rate selection module can be configured to control obfuscation operation (for example, according to binary mask file as indicated above) and produce the obfuscation control signal.Perhaps, the rate selection module can be configured to from receiving the cross signal relevant with the obfuscation control signal in another inner or outside module of speech coder.Speech coder 100 also can be configured to the frame that is received is carried out one or more pretreatment operation, for example perceptual weighting or other filtering operation.
Packet encoder 120 is configured to produce first voice packet that comprises the description of the spectrum envelope on first and second frequency bands as indicated above based on first valid frame of voice signal and in response to first state of rate controlled signal.For instance, first state of rate controlled signal can be indicated the wideband encoding scheme 1 according to the example of Figure 12.Packet encoder 120 also is configured to produce second voice packet that comprises the description of the spectrum envelope on first frequency band as indicated above based on second valid frame of voice signal and in response to second state that is different from first state of rate controlled signal.For instance, second state of rate controlled signal can be indicated the arrowband encoding scheme 2 according to the example of Figure 12.
Figure 15 B shows the block diagram of the embodiment 122 of packet encoder 120, and described embodiment 122 comprises that spectrum envelope is described counter 140, temporal information is described counter 150 and packet formatter 160.Spectrum envelope is described counter 140 and is configured to calculate description to the spectrum envelope of each frame to be encoded.Temporal information is described counter 150 and is configured to calculate description to the temporal information of each frame to be encoded.Packet formatter 160 be configured to produce comprise calculate gained to the description of spectrum envelope and calculate the voice packet to the description of temporal information of gained.Packet formatter 160 can be configured to produce voice packet according to required packet format (for example, as indicated by the state of rate controlled signal), and it may use different-format to different encoding schemes.Packet formatter 160 can be configured to voice packet is produced as and comprise the encode extraneous information (also being called " code index ") of institute's basis of frame, for example set of one or more of recognition coding scheme or code rate or pattern.
Spectrum envelope is described counter 140 and is configured to state according to the rate controlled signal and calculates description to the spectrum envelope of each frame to be encoded.Described description is based on present frame and also can be based at least a portion of one or more other frames.For instance, counter 140 can be configured to use the window that extends in one or more contiguous frames and/or calculate mean value (for example, the mean value of LSP vector) to the description of two or more frames.
Counter 140 can be configured to calculate description to the spectrum envelope of frame by carrying out for example spectrum analysis such as lpc analysis.Figure 15 C displaying spectrum envelope is described the block diagram of the embodiment 142 of counter 140, and described embodiment 142 comprises lpc analysis module 170, transform blockiis 180 and quantizer 190.Analysis module 170 is configured to carry out to the lpc analysis of frame and the model parameter set that produces correspondence.For instance, analysis module 170 can be configured to produce for example vector of LPC such as filter factor or reflection coefficient coefficient.Analysis module 170 can be configured to execution analysis on window, and described window comprises the part of one or more consecutive frames.In some cases, analysis module 170 is configured to make basis to be selected the rank of analyzing (for example, the number of the element in the coefficient vector) by the encoding scheme of encoding scheme selector switch 120 indications.
Transform blockiis 180 is configured to the model parameter set is converted to for quantizing comparatively effectively form.For instance, transform blockiis 180 can be configured to the LPC coefficient vector is converted to the LSP set.In some cases, transform blockiis 180 is configured to according to by the encoding scheme of encoding scheme selector switch 120 indications the LPC coefficient sets being converted to particular form.
Quantizer 190 is configured to by quantizing to gather the description that produces the quantized versions of spectrum envelope through the model parameter of conversion.Quantizer 190 can be configured to by the element of set through conversion being blocked and/or by selecting one or more quantization table index to represent that set through conversion quantizes the set through conversion.May need quantizer 190 is configured to according to the state of rate controlled signal the set through conversion will be quantified as particular form and/or length.For instance, quantizer 190 can be through implementing producing quantificational description as described in Figure 13 in response to first state of rate controlled signal, and produce quantificational description as described in Figure 14 in response to second state of rate controlled signal.
Temporal information is described counter 150 and is configured to calculate description to the temporal information of frame.Described description equally can be based on the temporal information of at least a portion of one or more other frames.For instance, counter 150 can be configured to calculate on the window in extending to one or more contiguous frames description and/or calculate mean value to the description of two or more frames.
Temporal information is described the description to temporal information that counter 150 can be configured to have according to the state computation of rate controlled signal particular form and/or length.For instance, counter 150 can be configured to according to the state computation of rate controlled signal the temporal envelope that comprises (A) frame and (B) description of the temporal information of the one or both in the pumping signal of frame, it can comprise description at least one tonal components (for example, pitch delay or hysteresis, pitch gain and/or to the description of prototype).In the LPC scrambler, usually pitch lag is calculated as the maximized lagged value of LPC autocorrelation of residuals function that makes frame.Pumping signal also can be based on out of Memory, for example comes the value of adaptivity code book (also being called the tone code book) and/or from the fixed code book value of (also being called innovation code book and position that may marker pulse).
Counter 150 can be configured to calculate to the description of the temporal information of the temporal envelope that comprises frame (for example, gain framework value and/or gain shape value).For instance, counter 150 can be configured to export this description in response to the indication of NELP encoding scheme.As described herein, calculate this description and can comprise with the signal energy computation on frame or the subframe being the quadratic sum of sample of signal, calculate the signal energy on the window of the part comprise other frame and/or subframe, and/or quantize to calculate the temporal envelope of gained.
Counter 150 can be configured to calculate the description to the temporal information of frame, and described temporal information comprises tone or the periodically relevant information with frame.For instance, counter 150 can be configured to the description of exporting the tone information (for example pitch lag or delay and/or pitch gain) that comprises frame in response to the indication of CELP encoding scheme.In some cases, the information relevant with the tonal components of frame (for example pumping signal or for example parameter such as pitch lag) can be from corresponding voice packet and also can be obtained from previous voice packet.Alternatively or extraly, counter 150 can be configured to export the periodic waveform description of (also being called " prototype ") in response to the indication of PPP encoding scheme.Calculating tone and/or prototypical information generally includes from the LPC residual error and extracts this information and also can comprise and will make up from the tone of present frame and/or prototypical information and from this information of one or more past frames.Counter 150 also can be configured to quantize this description to temporal information (for example, being quantified as one or more table indexs).
Counter 150 can be configured to calculate the description to the temporal information of frame that comprises pumping signal.For instance, counter 150 can be configured to export the description that comprises pumping signal in response to the indication of CELP encoding scheme.Pumping signal also can comprise description to tonal components (for example, pitch delay or hysteresis, pitch gain and/or to the description of prototype).Calculating pumping signal generally includes from the LPC residual error and derives this signal and also can comprise and will make up from the excitation information of present frame and this information from one or more past frames.Counter 150 also can be configured to quantize this description to temporal information (for example, being quantified as one or more table indexs).For the situation of lax CELP (RCELP) encoding scheme of speech coder 132 supports, counter 150 can be configured so that the pumping signal regularization.
Figure 16 A shows the block diagram of the embodiment 124 of packet encoder 122, and described embodiment 124 comprises that temporal information describes the embodiment 152 of counter 150.Counter 152 be configured to calculate temporal information to frame (for example, pumping signal, tone and/or prototypical information) description, described description based on as the description of describing that counter 140 calculated by spectrum envelope to the spectrum envelope of frame.
Figure 16 B shows that temporal information describes the block diagram of the embodiment 154 of counter 152, and described embodiment 154 is configured to calculate based on the LPC residual error of frame the description to temporal information.In this example, counter 154 is through arranging the description to the spectrum envelope of frame to receive as to describe that counter 142 calculated by spectrum envelope.De-quantizer A10 is configured to de-quantization is carried out in description, and inverse transformation block A20 is configured to using inverse transformation to obtain the LPC coefficient sets through the description of de-quantization.Prewhitening filter A30 is configured according to the LPC coefficient sets and produces the LPC residual error through arranging so that voice signal is carried out filtering.Quantizer A40 to the description of the temporal information of frame (for example is configured to quantize, be quantified as one or more table indexs), described description based on the LPC residual error and may be also based on the tone information of described frame and/or from the temporal information of one or more past frames.
May need to use the embodiment of packet encoder 122 to come the frame of wideband speech signal to be encoded according to minute band encoding scheme.In the case, spectrum envelope describe that counter 140 can be configured to continuously and/or concurrently and (possibly) calculate various descriptions according to different coding pattern and/or speed to the spectrum envelope of frame on frequency band.Temporal information describe that counter 150 also can be configured to continuously and/or concurrently and (possibly) calculate description according to different coding pattern and/or speed to the temporal information of frame on each frequency band.
Figure 17 A shows the block diagram of the embodiment 102 of speech coder 100, and described embodiment 102 is configured to according to minute band encoding scheme wideband speech signal be encoded.Speech coder 102 comprises bank of filters A50, it is configured to voice signal (is for example carried out subband signal that filtering produces the content on first frequency band that contains voice signal, narrow band signal) and contain the subband signal (for example, high band signal) of the content on second frequency band of voice signal.The particular instance of this type of bank of filters is described in No. 2007/088558 U.S. Patent Application Publication case (Butterworth people such as (Vos)) of disclosed being entitled as in (for example) on April 19th, 2007 " system, the method and apparatus (SYSTEMS; METHODS, AND APPARATUS FOR SPEECH SIGNALFILTERING) that are used for voice signal filtering ".For instance, bank of filters A50 can comprise be configured to voice signal carry out filtering produce narrow band signal low-pass filter and be configured to voice signal is carried out the Hi-pass filter that filtering produces high band signal.Bank of filters A50 also can comprise the down coversion sampler that is configured to reduce according to required corresponding extraction factor the sampling rate of narrow band signal and/or high band signal, described in (for example) No. 2007/088558 U.S. Patent Application Publication case (Butterworth people such as (Vos)).Speech coder 102 also can be configured to for example to carry out to high at least band signal, and the high-band burst suppresses squelch operations such as operation, described in No. 2007/088541 U.S. Patent Application Publication case (Butterworth people such as (Vos)) of on April 19th, 2007 disclosed being entitled as " system, the method and apparatus (SYSTEMS; METHODS, AND APPARATUS FOR HIGHBAND BURSTSUPPRESSION) that are used for high-band burst inhibition ".
Speech coder 102 also comprises the embodiment 126 of packet encoder 120, and it is configured to according to the state of rate controlled signal independent subband signal be encoded.Figure 17 B shows the block diagram of the embodiment 128 of packet encoder 126.Packet encoder 128 (for example comprises spectrum envelope counter 140a, the example of counter 142) and temporal information counter 150a (for example, counter 152 or 154 example), described counter 140a and 150a be configured to based on the narrow band signal that produces by bank of filters A50 and according to as calculate description respectively to spectrum envelope and temporal information by the indicated encoding scheme of the state of rate controlled signal.Packet encoder 128 (for example also comprises spectrum envelope counter 140b, the example of counter 142) and temporal information counter 150b (for example, counter 152 or 154 example), described computing machine 140b and 150b be configured to based on the high band signal that produces by bank of filters A50 and according to as produce the description to spectrum envelope and temporal information of calculating gained respectively by the indicated encoding scheme of the state of rate controlled signal.Packet encoder 128 also comprises the embodiment 162 of packet formatter 160, it is configured to produce voice packet, described voice packet comprise calculate gained to as by the state of rate controlled signal indicated arrowband and the spectrum envelope of the one or both in the high band signal and the description of temporal information.
As mentioned above, can be to the description of the temporal information of the high band portion of wideband speech signal based on description to the temporal information of the arrowband part of described signal.Figure 18 A shows the block diagram of the corresponding embodiment 129 of packet encoder 126.As packet encoder mentioned above 128, packet encoder 129 comprises through arranging with calculating describes counter 140a and 140b to the spectrum envelope of the corresponding description of spectrum envelope.Packet encoder 129 comprises that also temporal information describes the example 152a of counter 152 (for example, counter 154), and it is through arranging the description of the spectrum envelope of narrow band signal is calculated description to temporal information based on what calculate gained.Packet encoder 129 comprises that also temporal information describes the embodiment 156 of counter 150.Counter 156 is configured to calculate the description to the temporal information of high band signal, and described description is based on the description to the temporal information of narrow band signal.
Figure 18 B displaying time is described the block diagram of the embodiment 158 of counter 156.Counter 158 comprises high-band pumping signal generator A60, its be configured to based on as produce the high-band pumping signal by arrowband pumping signal that counter 152a produced.For instance, generator A60 can be configured to that arrowband pumping signal (or one or an above component) is carried out that for example frequency spectrum extensions, harmonic wave extensions, non-linear extension, spectrum folding and/or frequency spectrum are translated etc. and operates with generation high-band pumping signal.Extraly or alternatively, generator A60 can be configured to carry out to the frequency spectrum of random noise (for example, pseudorandom Gaussian noise signal) and/or amplitude shaping operation to produce the high-band pumping signal.Situation for generator A60 use pseudo-random noise signal may need to make encoder synchronous to the generation of this signal.This type of is used for method and apparatus description in more detail in No. 2007/0088542 U.S. Patent Application Publication case (Butterworth people such as (Vos)) of disclosed being entitled as in (for example) on April 19th, 2007 " system, the method and apparatus (SYSTEMS; METHODS, AND APPARATUS FOR WIDEBANDSPEECH CODING) that are used for wideband speech coding " that the high-band pumping signal produces.In the example of Figure 18 B, generator A60 is through arranging to receive the arrowband pumping signal through quantizing.In another example, generator A60 through arrange with receive another form of employing (for example, adopt pre-quantize or through the form of de-quantization) the arrowband pumping signal.
Counter 158 also comprises composite filter A70, and it is configured to produce based on the high-band pumping signal with to the synthetic high band signal of the description (for example, as being produced by counter 140b) of the spectrum envelope of high band signal.Usually basis is configured to produce synthetic high band signal in response to the high-band pumping signal wave filter A70 the class value (for example, one or more LSP or LPC coefficient vector) in the description of the spectrum envelope of high band signal.In the example of Figure 18 B, composite filter A70 is through arranging to receive the quantificational description of the spectrum envelope of high band signal and can be configured to comprise de-quantizer accordingly and (possibly) inverse transformation block.In another example, wave filter A70 through arrange with receive another form of employing (for example, adopt pre-quantize or through the form of de-quantization) the description to the spectrum envelope of high band signal.
Counter 158 also comprises high-band gain factor counter A80, and it is configured to calculate based on the temporal envelope of synthetic high band signal the description to the temporal envelope of high band signal.Counter A80 can be configured to this description is calculated as the temporal envelope that comprises high band signal and one or more distances between the temporal envelope of synthesizing high band signal.For instance, counter A80 can be configured to this distance calculation is gain framework value (for example, be calculated as the ratio between the energy measurement of corresponding frame of described two signals, or calculate the square root of ratio for this reason).Extraly or alternatively, can be configured to many these type of distance calculation be gain shape value (for example, be calculated as the ratio between the energy measurement of corresponding subframe of described two signals, or calculate the square root of a little ratios for this reason) to counter A80.In the example of Figure 18 B, counter 158 also comprises the quantizer A90 of the description to temporal envelope (for example, being quantified as one or more code book index) that is configured to quantize to calculate gained.The various features of the element of counter 158 and embodiment are described in No. 2007/0088542 U.S. Patent Application Publication case (Butterworth people such as (Vos)) that (for example) quotes as mentioned.
The various elements of the embodiment of speech coder 100 can be embodied in arbitrary combination of the hardware, software and/or the firmware that are regarded as being suitable for desired application.For instance, this class component can be fabricated to and reside on (for example) same chip or the electronics and/or the optical devices of two or more chip chambers in the chipset.An example of this device is the fixing or programmable array of logic element such as transistor or logic gate for example, and in these elements any one can be embodied as one or more this type of arrays.Can with in these elements any both or both more than or even be fully implemented in identical one or more arrays.This (a bit) array implement (for example, can be comprised in the chipset of two or more chips) in one or more chips.
Also one or more elements of the various embodiments of speech coder as described herein 100 can be embodied as one or more instruction sets whole or in part, described instruction set through arrange with fix at one or more of logic element (for example microprocessor, flush bonding processor, the IP kernel heart, digital signal processor, FPGA (field programmable gate array), ASSP (Application Specific Standard Product) and ASIC (special IC)) or programmable array on carry out.Also in the various elements of the embodiment of speech coder 100 any one (for example can be presented as one or more computing machines, comprise through the machine of programming with one or more arrays of carrying out one or more instruction sets or sequence, also be called " processor "), and can with in these elements any both or both more than or even be fully implemented in this identical (a bit) computing machine.
The various elements of the embodiment of speech coder 100 can be included in cellular phone for example etc. and be used for the device of radio communication or have in other device of this communication capacity.This device can be configured to communicate (for example, using for example one or more agreements such as VoIP) with the network of circuit switching and/or packet switch.This device can be configured to the signal executable operations to the encoded frame of carrying, for example staggered, perforation, convolutional encoding, error correction code, coding, radio frequency (RF) modulation and/or RF transmission to one or more procotols (for example, Ethernet, TCP/IP, cdma2000) layer.
Might make one or more elements of the embodiment of speech coder 100 be used to carry out not directly related task or other instruction set with the operation of equipment, for example with equipment embedded wherein device or the relevant task of another operation of system.Also might make one or more elements of the embodiment of speech coder 100 have common structure (for example, in order at the processor corresponding to the part of different elements of different time run time version, through carrying out) to carry out corresponding to the instruction set of the task of different elements at different time or carry out the electronics of operation of different elements at different time and/or the layout of optical devices.In this type of example, packet encoder 120 and frame formatter 130 are embodied as through arrange the instruction set to carry out on same processor.In another this type of example, spectrum envelope is described counter 140a and 140b be embodied as the same instruction set of carrying out at different time.
Figure 19 A shows according to the processing of the common configuration process flow diagram from the method M200 of the voice packet of encoded voice signal.Method M200 is configured to receive from the information of two voice packets continuous encoded frame of encoded voice signal (for example, from) and produces description to the spectrum envelope of two corresponding frames of voice signal.Based on from first voice packet information of (also being called " reference " voice packet), task T210 obtains the description to the spectrum envelope on first and second frequency bands of first frame of voice signal.This description can be the single description of extending on described two frequency bands, perhaps its can comprise each in described frequency band corresponding one on the independent description of extending.Based on the information from second voice packet, task T220 obtains the description to the spectrum envelope on first frequency band of second frame (also being called " target " frame) of voice signal.Based on the information from the reference voice bag, task T230 obtains the description to the spectrum envelope on second frequency band of target frame.Based on the information from second voice packet, task T240 obtains the description at the tone information of first frequency band to target frame.
The application of Figure 20 methods of exhibiting M200.In this example, the description of spectrum envelope is had the LPC rank, and to target frame on the LPC rank of the description of the spectrum envelope on second frequency band less than LPC rank to the description of the spectrum envelope on first frequency band of target frame.In particular instance, the LPC rank of the description of the spectrum envelope on first and second frequency bands of target frame are respectively 10 and 6.Other example comprise to the LPC rank of the description of the spectrum envelope on second frequency band of target frame for to the LPC rank of the description of the spectrum envelope on first frequency band of target frame six ten five ten at least percent, at least percent, be no more than 75 percent, be no more than 80 percent, equate with it and greater than its situation.
Figure 20 shows that also the LPC rank in the description of the spectrum envelope on first and second frequency bands to first frame equal the example to the summation on the LPC rank of the description of the spectrum envelope on first and second frequency bands of target frame.In another example, may be greater than or less than summation to the LPC rank in the description of the spectrum envelope on first and second frequency bands of first frame to the LPC rank of the description of the spectrum envelope on first and second frequency bands of target frame.
The reference voice bag can comprise the quantificational description to the spectrum envelope on first and second frequency bands, and second voice packet can comprise the quantificational description to the spectrum envelope on first frequency band.In a particular instance, the quantificational description to the spectrum envelope on first and second frequency bands included in the reference voice bag has the length of 36 positions, and the included quantificational description to the spectrum envelope on first frequency band has the length of 22 positions in second voice packet.In other example, in second voice packet length of included quantificational description to the spectrum envelope on first frequency band be not more than included quantificational description to the spectrum envelope on first and second frequency bands in the reference voice bag length 65,70,75 or 80 percent.
Among task T210 and the T220 each can be configured to comprise the one or both in following two operations: the analysis voice packet is to extract the quantificational description to spectrum envelope; And de-quantization is to the quantificational description of the spectrum envelope parameter sets with the encoding model that obtains described frame.The typical embodiments of task T210 and T220 comprises this two operations, make each task handle corresponding voice packet to produce the description to spectrum envelope (for example, one or more LSF, LSP, ISF, ISP and/or LPC coefficient vector) of the form that adopts the model parameter set.In a particular instance, the reference voice bag has the length of 171 positions, and second voice packet has the length of 80 positions.In other example, the length of second voice packet be no more than the reference voice bag length 50,60,70 or 75 percent.
The reference voice bag can comprise the quantificational description at the temporal information of first and second frequency bands, and second voice packet can comprise the quantificational description at the temporal information of first frequency band.In a particular instance, included in the reference voice bag have the length of 133 positions to the quantificational description at the temporal information of first and second frequency bands, and included in second voice packet quantificational description at the temporal information of first frequency band is had the length of 58 positions.In other example, in second voice packet included length at the quantificational description of the temporal information of first frequency band is not more than in the reference voice bag included at 45,50 or 60 percent of the length of the quantificational description of the temporal information of first and second frequency bands, perhaps be not less than its 40 percent.
Task T210 and T220 also can be through implementing to produce the description to temporal information from corresponding voice packet.For instance, the one or both in these tasks can be configured to based on from the information of corresponding voice packet and obtain to temporal envelope description, to the description of pumping signal, to the description of tone information or to the description of prototype.As in the description that obtains spectrum envelope, this task can comprise from the voice packet analysis the quantificational description to temporal information of the quantificational description of temporal information and/or de-quantization.The embodiment of method M200 also can be configured to make task T210 and/or task T220 equally based on obtaining from the information (for example from several information from the voice packet of one or more previous encoded frames) of one or more other voice packets to the description of spectrum envelope and/or to the description of temporal information.For instance, to the description of pumping signal, to the description of tone information and to the description of prototype usually based on information from previous frame.
Task T240 is configured to based on from the information of second voice packet and obtain the description at the tone information of first frequency band to target frame.Description to tone information can comprise the one or more description in the following: pitch lag, pitch gain, prototype and pumping signal.Task T240 can comprise from second voice packet analysis the quantificational description to tone information of the quantificational description of tone information and/or de-quantization.For instance, second voice packet can comprise the quantificational description at the tone information of first frequency band, its length be second voice packet length five at least percent and/or at the most 10.In a particular instance, second voice packet has the length of 80 positions, and included in second voice packet quantificational description at the tone information (for example, pitch lag index) of first frequency band is had the length of 7 positions.Task T240 also can be configured to based on from the tone information of second voice packet and calculate the pumping signal at first frequency band of target frame.Also may need task T240 is configured to the pumping signal of calculating target frame at the pumping signal of first frequency band of (for example, with reference to high-band excitation generator A60 and 330) as described herein based target frame at second frequency band.
The embodiment of method M200 also can be configured to make task T240 equally based on the description that obtains from the information (for example from several information from the voice packet of one or more previous encoded frames) of one or more other voice packets tone information.The application of this embodiment M210 of Figure 22 methods of exhibiting M200.Method M210 comprises the embodiment T242 of task T240, its be configured to based on come in the self-reference and second voice packet each information and obtain description at the tone information of first frequency band to target frame.For instance, task T242 can be configured to based on one based on from first tone laging value of the information of second voice packet and based on from second tone laging value of the information of reference voice bag and the delayed profile at first frequency band of interpolation target frame.Task T242 also can be configured to based on come in the self-reference and second voice packet each tone information and calculate the pumping signal at first frequency band of target frame.
Method M200 is usually through being implemented as the part of big tone decoding method, and expection and disclose Voice decoder and the tone decoding method that is configured to manner of execution M200 thus clearly.Sound encoding device can be configured in the embodiment of the manner of execution M100 of scrambler place and in the embodiment of the manner of execution M200 of demoder place.In the case, as by " first voice packet " of task T110 coding corresponding to reference voice bag to task T210 and T230 information provision, and as by " second voice packet " of task T120 coding corresponding to voice packet to task T220 and T240 information provision.Figure 21 uses the example by using method M100 coding and a pair of successive frame by using method M200 decoding to come this relation between illustration method M100 and the M200.Method M200 also can comprise from (for example, as being produced by task T130 and the T140) analysis of corresponding encoded frame or obtains the operation of the reference voice bag and second voice packet in other mode through being embodied as.
No matter the particular instance of Figure 21 how, notice that clearly in general, the application of the application of method M100 and method M200 is not limited to handle several to successive frame.For instance, one of method M200 this type of other use, supply can separate one or more intervention frames (that is, through erase frame) of having lost with the encoded frame of supplying the voice packet of being handled by task T220 and T240 by the encoded frame of the voice packet that task T210 and T230 handle in transmission.
Task T220 is configured at least mainly based on from the information of second voice packet and obtain description to the spectrum envelope on first frequency band of target frame.For instance, task T220 can be configured to fully based on the description that obtains from the information of second voice packet the spectrum envelope on first frequency band of target frame.Perhaps, task T220 can be configured to equally to obtain based on out of Memory (for example from several information from the voice packet of one or more previous encoded frames) description to the spectrum envelope on first frequency band of target frame.In the case, task T220 be configured so that to from the added flexible strategy of the information of second voice packet greater than to the added flexible strategy of out of Memory.For instance, this embodiment of task T220 can be configured to the description to the spectrum envelope on first frequency band of target frame be calculated as from the information of second voice packet with from one (for example from previous encoded frame, with reference to encoded frame) the mean value of information of voice packet, wherein to from the added flexible strategy of the information of second voice packet greater than to from the added flexible strategy of the information of other voice packet.Similarly, task T220 can be configured at least mainly based on from the information of second voice packet and obtain the description at the temporal information of first frequency band to target frame.
Based on the information (also being called " reference spectrum information " in this article) from the reference voice bag, task T230 obtains the description to the spectrum envelope on second frequency band of target frame.The process flow diagram of the embodiment M220 of Figure 19 B methods of exhibiting M200, described embodiment M220 comprises the embodiment T232 of task T230.As the embodiment of task T230, task T232 obtains the description to the spectrum envelope on second frequency band of target frame based on reference spectrum information.In the case, reference spectrum information is included in the description to the spectrum envelope of first frame of voice signal.The example of the application of Figure 23 methods of exhibiting M220.
Task T230 is configured at least mainly to obtain based on reference spectrum information the description to the spectrum envelope on second frequency band of target frame.For instance, task T230 can be configured to the complete description that obtains based on reference spectrum information the spectrum envelope on second frequency band of target frame.Perhaps, task T230 can be configured to based on (A) based on reference spectrum information in the description of the spectrum envelope on second frequency band with (B) based on the description of the spectrum envelope on second frequency band being obtained description to the spectrum envelope on second frequency band of target frame from the information of second voice packet.
In the case, task T230 can be configured so that to based on the added flexible strategy of the description of reference spectrum information greater than to based on the added flexible strategy of description from the information of second voice packet.For instance, this embodiment of task T230 can be configured to the description to the spectrum envelope on second frequency band of target frame is calculated as based on reference spectrum information and mean value from the description of the information of second voice packet, wherein to based on the added flexible strategy of the description of reference spectrum information greater than to based on the added flexible strategy of description from the information of second voice packet.In another case, can be greater than based on LPC rank based on the LPC rank of the description of reference spectrum information from the description of the information of second voice packet.For instance, can be 1 (for example, described description can be the spectral tilt value, for example the value of first reflection coefficient) based on LPC rank from the description of the information of second voice packet.Similarly, task T230 based on reference time information (for example can be configured at least mainly, fully based on reference time information, or also smaller portions ground based on information from second voice packet) and obtain description at the temporal information of second frequency band to target frame.
Task T210 can be through implementing obtaining the description to spectrum envelope from the reference voice bag, and described description is to represent at the single full band of first and second frequency bands on both.Yet, more be typically with task T210 be implemented as with this describe to obtain on first frequency band with the independent description of spectrum envelope on second frequency band.For instance, task T210 can be configured to obtain to describe separately from the reference voice bag, and described reference voice bag has used branch band encoding scheme as described herein (for example, being encoding scheme 1 in the example of Figure 12) to encode.
The process flow diagram of the embodiment M230 of Figure 19 C methods of exhibiting M220 wherein is embodied as task T210 two subtask T212a and T212b.Based on the information from the reference voice bag, task T212a obtains the description to the spectrum envelope on first frequency band of first frame.Based on the information from the reference voice bag, task T212b obtains the description to the spectrum envelope on second frequency band of first frame.Task T212a and/or T212b can comprise from the analysis of corresponding voice packet the quantificational description to spectrum envelope of the quantificational description of spectrum envelope and/or de-quantization.
Task T212a and/or T212b also can be through implementing with based on the description that produces from the information of corresponding voice packet temporal information.For instance, the one or both in these tasks can be configured to based on from the information of corresponding voice packet and obtain to temporal envelope description, to the description of pumping signal and/or to the description of tone information.As in the description that obtains spectrum envelope, this task can comprise from the voice packet analysis the quantificational description to temporal information of the quantificational description of temporal information and/or de-quantization.
Method M230 also comprises the embodiment T234 of task T232.As the embodiment of task T230, task T234 obtains the description to the spectrum envelope on second frequency band of target frame, and described description is based on reference spectrum information.As in task T232, reference spectrum information is included in the description to the spectrum envelope of first frame of voice signal.In the particular case of task T234, reference spectrum information is included in the description to the spectrum envelope on second frequency band of first frame (and may be identical with described description).Task T234 also can be configured to obtain the description at the temporal information of second frequency band to target frame, and described description is based on the information at (and may be identical with described description) in the description of the temporal information of second frequency band that is included in first frame.
The application of Figure 24 methods of exhibiting M230, it receives from the information of two voice packets and produces description to the spectrum envelope of two corresponding frames of voice signal.In this example, the description of spectrum envelope is had the LPC rank, and the LPC rank of the description of the spectrum envelope on first and second frequency bands of first frame are equaled LPC rank to the description of the spectrum envelope on frequency band of target frame.Other example comprises wherein the one or both in the description of the spectrum envelope on first and second frequency bands of first frame greater than the situation that the correspondence of the spectrum envelope on frequency band of target frame is described.
The reference voice bag can comprise in the quantificational description of the spectrum envelope on first frequency band with to the quantificational description of the spectrum envelope on second frequency band.In a particular instance, the quantificational description to the spectrum envelope on first frequency band included in the reference voice bag has the length of 28 positions, and the included quantificational description to the spectrum envelope on second frequency band has the length of 8 positions in the reference voice bag.In other example, in the reference voice bag length of included quantificational description to the spectrum envelope on second frequency band be not more than included quantificational description to the spectrum envelope on first frequency band in the reference voice bag length 30,40,50 or 60 percent.
The reference voice bag can comprise at the quantificational description of the temporal information of first frequency band with to the quantificational description at the temporal information of second frequency band.In a particular instance, included in the reference voice bag have the length of 125 positions to the quantificational description at the temporal information of first frequency band, and included in the reference voice bag quantificational description at the temporal information of second frequency band is had the length of 8 positions.In other example, includedly in the reference voice bag be not more than in the reference voice bag included at 10,20,25 or 30 of the length of the quantificational description of the temporal information of first frequency band to length at the quantificational description of the temporal information of second frequency band.
Second voice packet can comprise to the quantificational description of the spectrum envelope on first frequency band and/or to the quantificational description at the temporal information of first frequency band.In a particular instance, the quantificational description to the spectrum envelope on first frequency band included in the second encoded frame has the length of 22 positions.In other example, in second voice packet length of included quantificational description to the spectrum envelope on first frequency band be not less than included quantificational description to the spectrum envelope on first frequency band in the reference voice bag length 40,50,60,70 or 75 percent.In a particular instance, included in second voice packet have the length of 58 positions to the quantificational description at the temporal information of first frequency band.In other example, in second voice packet included to length be at the quantificational description of the temporal information of first frequency band included quantificational description to the spectrum envelope on first frequency band in the reference voice bag length two ten five at least percent, 30,40 or 45 and/or at the most 50,60 or 70 percent.
In the typical embodiments of method M200, reference spectrum information is the description to the spectrum envelope on second frequency band.This description can comprise model parameter set, for example one or more LSP, LSF, ISP, ISF or LPC coefficient vector.In general, this description is the description to the spectrum envelope on second frequency band of first frame that obtains from the reference voice bag as by task T210.Reference spectrum information is comprised (for example, first frame) on first frequency band and/or the description of the spectrum envelope on another frequency band.
The application of the embodiment M240 of Figure 25 methods of exhibiting M200, described embodiment comprises task T260.Task T260 is configured to based on from the information of the encoded frame that comprises second voice packet and the burst that produces the information signal that separates with voice signal.For instance, task T260 can be configured to export the burst of the specific part of encoded frame as signaling as indicated above or secondary service signal.This burst can have four ten at least percent, 45 or 50 the length in the position for the length of encoded frame.Alternatively or extraly, this burst can have nine ten at least percent the length in the position of the length that is second voice packet, and perhaps this burst can have the length that equals or be longer than the length of second voice packet.In a particular instance, described burst has 86 positions, and () length in another example, 87 positions, second voice packet has the length of 80 positions, and encoded frame has the length of 171 positions.Method M210, M220 and M230 also can comprise task T260 through being embodied as.
Task T230 generally includes the operation of retrieving reference spectrum information from the array of for example semiconductor memory memory elements such as (also being called " impact damper " in this article).Comprise situation to the description of the spectrum envelope on second frequency band for reference spectrum information, the action of retrieving reference spectrum information can be enough to the T230 that finishes the work.Perhaps, may need task T230 is configured to calculate the description (also being called " target spectrum description " in this article) to the spectrum envelope on second frequency band of target frame but not simply it is retrieved.For instance, task T230 can be configured to by adding to reference spectrum information that random noise calculates that target spectrum is described and/or based on calculating target spectrum and describe from the spectrum information (for example, based on the information from an above reference voice bag) of at least one additional voice bag.For instance, task T230 can be configured to target spectrum is described the mean value to the description of the spectrum envelope on second frequency band be calculated as from two or more reference voice bags, and this calculating can comprise to the mean value that calculates gained and adds random noise.
Task T230 can be configured to by in time from reference spectrum information extrapolation or by in time interpolation between the description of the spectrum envelope on second frequency band is calculated target spectrum describe from two or more reference voice bags.Alternatively or extraly, task T230 can be configured to by on the frequency to target frame in the description of the spectrum envelope of (for example, on first frequency band) on another frequency band extrapolation and/or by describing interpolation between the description of the spectrum envelope on other frequency band being calculated target spectrum on the frequency.
Usually, the description of reference spectrum information and target spectrum is the vector (or " spectral vectors ") of frequency spectrum parameter value.In this type of example, both are the LSP vector target and reference spectrum vector.In another example, target and reference spectrum the vector both be the LPC coefficient vector.In a further example, target and reference spectrum the vector both be the reflection coefficient vector.Task T230 for example can be configured to basis s ti = s ri ∀ i ∈ { 1,2 , . . . , n } Expression formula and s is wherein described from reference spectrum information reproduction target spectrum tBe target spectrum vector, s rBe reference spectrum vector (its value is usually in-1 to+1 scope) that i is the vector element index, and n is vectorial s tLength.In the variation of this operation, task T230 is configured to use weighting factor (or vector of weighting factor) to the reference spectrum vector.In another variation of this operation, task T230 by basis for example is configured to s ti = s ri + z i ∀ i ∈ { 1,2 , . . . , n } Expression formula add random noise and calculate the target spectrum vector to reference spectrum vector, wherein z is the vector of random value.In the case, each element of z can be stochastic variable, and its value is distributed on (for example, equably) required scope.
May need to guarantee value that target spectrum is described suffer restraints (for example, in-1 to+1 scope).In the case, task T230 for example can be configured to basis s ti = w s ri + z i ∀ i ∈ { 1,2 , . . . , n } Expression formula and calculate target spectrum and describe, wherein the value of w with each element of value (for example, in 0.3 to 0.9 scope) between 0 and 1 and z be distributed in (for example, equably) from-(1-w) on+(1-w) scope.
In another example, task T230 is configured to based on describing (for example, as from each the mean value to the description of the spectrum envelope on second frequency band in the reference voice bag of two most recent) from each the target spectrum that the description of the spectrum envelope on second frequency band is calculated in the above reference voice bag.In the case, the weighting that may need reference vector is differed from one another (for example, can to the heavier in addition flexible strategy of vector) from more recently reference voice bag.
May need task T230 is implemented as and be used to handle to through minute example of the comparatively general operation of wiping of the high band portion of the voice packet of band coding.For instance, Voice decoder or tone decoding method can be configured to carry out this operation after high at least band portion is wiped free of the voice packet of (that is, lack or be found and had multiple error and can't be recovered reliably) receiving.
In representative instance, task T230 is configured to calculate target spectrum based on the weighted version of reference spectrum information and describes.Flexible strategy w can be as the scalar in following formula: s ti = w s ri ∀ i ∈ { 1,2 , . . . , n } . Perhaps, flexible strategy w can be the vector of the element that may have different value, as following formula: s ti = w i s ri ∀ i ∈ { 1,2 , . . . , n } .
For task T230 is the situation of example that is used to wipe the comparatively general operation of processing, may need flexible strategy are embodied as attenuation factor α.Also may need to implement this operation so that each in the continuous series that the value of attenuation factor α is wiped with high-band reduces.For instance, attenuation factor α can have value 0.9 for the bag of first in the described series, can have value 0.7 for the bag of second in the described series, and can have value 0.5 for the subsequent packet in the described series.(in the case, may need each bag of wiping in the series is used identical reference spectrum vector.) in another this type of example, task T230 is configured to calculate target spectrum based on additive constant v and describes, described additive constant v can be as expression formula s ti = α s ri + v ∀ i ∈ { 1,2 , . . . . , n } In scalar or as expression formula s ti = α s ri + v i ∀ i ∈ { 1,2 , . . . , n } In vector.This constant v can be embodied as initial spectrum vector s 0, as in expression formula s ti = α s ri + s 0 i ∀ i ∈ { 1,2 , . . . , n } In.In the case, initial spectrum vector s 0The value of element can be function (for example, the s of i 0i=bi, wherein b is a constant).In a particular instance, s 0 i = 0.048 i ∀ i ∈ { 1,2 , . . . , n } .
Task T230 also can be through implementing also to calculate the target spectrum description based on the spectrum envelope on another frequency band of one or more frames except that reference spectrum information.For instance, this embodiment of task T230 can be configured to by extrapolating and the description of calculating target spectrum from the spectrum envelope on another frequency band (for example, first frequency band) of present frame and/or one or more previous frames on frequency.
Task T230 can be configured to based on the description that obtains from the information (also being called " reference time information " in this article) of reference voice bag the temporal information on second frequency band of target frame.Reference time information is generally the description to the temporal information on second frequency band.This description can comprise one or more gain framework values, gain profile value, pitch parameters value and/or code book index.In general, this description is the description to the temporal information on second frequency band of first frame that obtains from the reference voice bag as by task T210.Reference time information is comprised (for example, first frame) on first frequency band and/or the description of the temporal information on another frequency band.
Task T230 can be configured to obtain description (also being called " object time description " in this article) to the temporal information on second frequency band of target frame by duplicating reference time information.Perhaps, may need task T230 is configured to obtain described object time description by describing based on information calculations object time reference time.For instance, task T230 can be configured to calculate the object time description by add random noise to reference time information.Task T230 also can be configured to describe based on calculate the object time from the information of an above reference voice bag.For instance, task T230 can be configured to the object time is described the mean value to the description of the temporal information on second frequency band be calculated as from two or more reference voice bags, and this calculating can comprise to the mean value that calculates gained and adds random noise.As indicated above, may need task T230 acquisition that the description of the temporal information on second frequency band of target frame is handled through minute part of the example of the comparatively general operation of wiping of the high band portion of the voice packet of band coding as being used to.
Object time describe and reference time information each can comprise description to temporal envelope.As mentioned above, can comprise gain framework value and/or one group of gain shape value to the description of temporal envelope.Alternatively or extraly, the object time describe and reference time information each can comprise description to pumping signal.Can comprise description to tonal components (for example, pitch lag or delay, pitch gain and/or to the description of prototype) to the description of pumping signal.
Task T230 is configured to be set at the gain shape that the object time describes smooth usually.For instance, task T230 can be configured to the gain shape value that the object time describes is set at and be equal to each other.This type of embodiment of task T230 is configured to all gain shape values are set at factor 1 (for example, 0dB).This type of embodiment of another of task T230 is configured to all gain shape values are set at factor 1/n, and wherein n is the number of the gain shape value in describing the object time.
Task T230 can be configured to according to for example g t=zg rOr g t=wg r+ (1-w) z expression formula and calculate the gain framework value g that the object time describes t, g wherein rBe the gain framework value from reference time information, z is a random value, and w is a weighting factor.The typical range of the value of z comprises 0 to 1 and-1 to+1.The typical range of the value of w comprises that 0.5 (or 0.6) is to 0.9 (or 1.0).
In representative instance, task T230 is configured to calculate based on the weighted version of the gain framework value of reference time information the gain framework value that the object time describes, as at expression formula g t=wg rIn.For task T230 is the situation of example that is used to wipe the comparatively general operation of processing, may need flexible strategy are embodied as attenuation factor β.Also may need to implement this operation so that each in the continuous series that the value of attenuation factor β is wiped with high-band reduces.For instance, attenuation factor β can have value 0.9 for the bag of first in the described series, can have value 0.7 for the bag of second in the described series, and can have value 0.5 for the subsequent packet in the described series.(in the case, may need each bag of wiping in the series is used identical reference gain framework value.) in another this type of example, task T230 is configured to based on one or more gain shape values h from reference time information RiAnd calculate the gain framework value that the object time describes, as in expression formula g t = βg r × Σ i = 1 n h ri In, wherein n is the number of the gain shape value in the reference voice bag.
Task T230 can be configured to based on from the gain framework value of the reference voice bag of two or three most recent and calculate the gain framework value of target frame.In this type of example, task T230 is configured to the gain framework value that the object time describes for example is calculated as basis g t = g r 1 + g r 2 2 The mean value of expression formula, g wherein R1Be gain framework value and g from the reference voice bag of most recent R2Be gain framework value from the reference voice bag of next most recent.In related example, the weighting that reference gain framework value is differed from one another (for example, can to the heavier in addition flexible strategy of value more recently).In a further example, task T230 is configured to the mean value that calculates gained is used attenuation factor β and/or comprised based on the factor from one or more gain shape values of reference time information.
The embodiment of method M200 (comprising method M210, M220 and M230) is configured to comprise with the operation of reference spectrum information stores to impact damper usually.This embodiment of method M200 also can comprise the operation of reference time information stores to impact damper.Perhaps, this embodiment of method M200 can comprise the operation of reference spectrum information and reference time both information being stored into impact damper.
It will be reference spectrum information based on the information stores of described voice packet that the embodiment of method M200 can be configured under current speech includes the situation in the description of the spectrum envelope on second frequency band.For instance, under the situation of as shown in figure 12 a group coding scheme, this embodiment of method M200 can be configured to stored reference spectrum information under the situation of any one (that is, being not encoding scheme 2 or 4) in the code index of voice packet indication encoding scheme 1 and 3.More in general, this embodiment of method M200 can be configured in the code index of voice packet indication wideband encoding scheme but not stored reference spectrum information under the situation of arrowband encoding scheme.The a little embodiments of this of method M200 can be configured to according to identical standard stored reference temporal information.
The reference spectrum information of being stored may need implementation method M200 so that can be used for an above reference voice bag simultaneously.For instance, task T230 can be configured to calculate based on the target spectrum from the information of an above reference voice bag and describe.In some cases, method M200 can be configured at any one time will from the reference spectrum information of the reference voice bag of most recent, from the information of the reference voice bag of second most recent and (possibly) from one or more more recently the information of reference voice bag maintain in the memory storage.The method also can be configured to keep the identical history or the different history of reference time information.For instance, method M200 can be configured to keep from the reference voice bag of two most recent each to the description of spectrum envelope with only from the description to temporal information of the reference voice bag of most recent.
In the typical case of the embodiment of method M200 uses, the array of logic element (for example, logic gate) be configured to carry out in the various tasks of described method one, one or more or even whole.One or more (may be whole) in the described task also can be through (for example being embodied as code, one or more instruction sets), it is can be by comprising that logic element (for example, processor, microprocessor, microcontroller or other finite state machine) array machine (for example, computing machine) embodies in the computer program that reads and/or carry out (for example, for example dish, quickflashing or one or more data storage mediums such as other Nonvolatile memory card, semiconductor memory chips).The task of the embodiment of method M200 also can be carried out by this type of array or machine more than one.In these or other embodiment, described task can be used for the device of radio communication (for example cellular phone) or have execution in other device of this communication capacity.This device can be configured to communicate (for example, using for example one or more agreements of VoIP) with the network of circuit switching and/or packet switch.For instance, this device can comprise the RF circuit that is configured to receive encoded frame.
Figure 26 A shows the block diagram of Voice decoder 200 that is used to handle encoded voice signal according to common configuration.For instance, Voice decoder 200 can be configured to carry out the tone decoding method of the embodiment that comprises method M200 as described herein.Voice decoder 200 comprises the steering logic 210 that is configured to produce the control signal with value sequence.Voice decoder 200 also comprises packet decoder 220, its be configured to based on the value of control signal and based on the corresponding voice packet of encoded voice signal and the computing voice signal through decoded frame.
The communicator (for example cellular phone) that comprises Voice decoder 200 can be configured to receive encoded voice signal from wired, wireless or light transmission channel.This device can be configured to encoded voice signal is carried out pretreatment operation, for example to the decoding of error correction and/or redundant code.This device also can comprise both embodiments (for example, in transceiver) of speech coder 100 and Voice decoder 200.
Steering logic 210 is configured to produce the control signal that comprises value sequence, and described value sequence is based on the code index of the voice packet of encoded voice signal.Each value in the described sequence corresponding to the voice packet of encoded voice signal (except as hereinafter discussed in the situation of erase frame) and have one in a plurality of states.In some embodiments of Voice decoder as mentioned below 200, described sequence is (that is the sequence of high-value and low-value) of binary value.In other embodiment of Voice decoder as mentioned below 200, the value of described sequence can have two above states.
Steering logic 210 can be configured to determine the code index of each voice packet.For instance, steering logic 210 can be configured to read from voice packet at least a portion of code index, determines the bit rate of voice packet from one or more parameters (for example frame energy), and/or determine suitable coding mode from the form of voice packet.Perhaps, Voice decoder 200 can comprise and is configured to determine the code index of each voice packet and it is provided to another element of steering logic 210 that perhaps Voice decoder 200 can be configured to from another module received code index of the equipment that comprises Voice decoder 200 through being embodied as.
To not receive or be called frame erasing through being received as voice packet with the too much error that need recover as expection.Voice decoder 200 can be configured to make one or more states of code index to wipe in order to indication frame erasing or partial frame, for example the carrying of voice packet lacking at the part of the frequency spectrum of second frequency band and temporal information.For instance, Voice decoder 200 can be configured to make the code index of the voice packet of encoding by use encoding scheme 2 (as in Figure 12) indicate as described in the wiping of high band portion of frame.In the case, Voice decoder 200 can be configured to the embodiment of method M200 is implemented as the example of the conventional method of wiping processing.Voice decoder 200 also can be configured to make the code index of the voice packet of encoding by any one (as in Figure 12) in the use encoding scheme 2 and 4 indicate as described in the wiping of high band portion of frame.
Packet decoder 220 is configured to calculate through decoded frame based on the corresponding voice packet of the value of control signal and encoded voice signal.When the value of control signal had first state, packet decoder 220 was based on to the description of the spectrum envelope on first and second frequency bands and calculate through decoded frame, and wherein said description is based on the information from corresponding voice packet.When the value of control signal has second state, packet decoder 220 retrievals are to the description of the spectrum envelope on second frequency band, and based on the description of being retrieved and based on to the description of the spectrum envelope on first frequency band and calculate through decoded frame, wherein to the description on first frequency band based on information from corresponding voice packet.
Figure 26 B shows the block diagram of the embodiment 202 of Voice decoder 200.Voice decoder 202 comprises the embodiment 222 of packet decoder 220, and it comprises first module 230 and second module 240. Module 230 and 240 is configured to calculate the respective sub-bands part through decoded frame.Specifically, first module 230 be configured to calculate frame on first frequency band through decoded portion (for example, narrow band signal), and second module 240 be configured to based on the value of control signal and calculate frame on second frequency band through decoded portion (for example, high band signal).
Figure 26 C shows the block diagram of the embodiment 204 of Voice decoder 200.Parser 250 is configured to analyze the position of voice packet so that code index is provided and provides at least one description to spectrum envelope to packet decoder 220 to steering logic 210.In this example, Voice decoder 204 also is the embodiment of Voice decoder 202, makes that parser 250 is configured to provide description to the spectrum envelope on frequency band (but in time spent) to module 230 and 240.Parser 250 also can be configured to provide at least one description to temporal information to Voice decoder 220.For instance, parser 250 can be through implementing to provide to module 230 and 240 description at the temporal information of frequency band (but in time spent).
Parser 250 also can be configured to analyze the position of the encoded frame that contains described voice packet to produce the burst (for example, the burst of signaling of being discussed as mentioned or secondary service) of the information signal that separates with voice signal.Perhaps, Voice decoder 204 or the equipment that contains Voice decoder 204 in addition mode be configured to analyze encoded frame and produce voice packet (for example, as input) and burst parser 250.
Packet decoder 222 also comprises bank of filters 260, and what it was configured to make up described frame assigns to produce wideband speech signal through lsb decoder on first and second frequency bands.The particular instance of this type of bank of filters is described in No. 2007/088558 U.S. Patent Application Publication case (Butterworth people such as (Vos)) of disclosed being entitled as in (for example) on April 19th, 2007 " system, the method and apparatus (SYSTEMS; METHODS, ANDAPPARATUS FOR SPEECH SIGNAL FILTERING) that are used for voice signal filtering ".For instance, bank of filters 260 can comprise be configured to narrow band signal carry out filtering produce first passband signal low-pass filter and be configured to high band signal is carried out the Hi-pass filter that filtering produces second passband signal.Bank of filters 260 also can comprise the up-conversion sampler that is configured to improve according to required corresponding interpolation factor the sampling rate of narrow band signal and/or high band signal, described in (for example) No. 2007/088558 U.S. Patent Application Publication case (Butterworth people such as (Vos)).
Figure 27 A shows the block diagram of the embodiment 232 of first module 230, and described embodiment 232 comprises that spectrum envelope describes the example 270a of demoder 270 and the example 280a that temporal information is described demoder 280.Spectrum envelope is described demoder 270a and is configured to decode to the description (for example, as receiving from parser 250) of the spectrum envelope on first frequency band.Temporal information is described demoder 280a and is configured to decode to the description (for example, as receiving from parser 250) at the temporal information of first frequency band.For instance, temporal information is described demoder 280a and can be configured to decoding at the tone information of first frequency band.Temporal information is described demoder 280a and also can be configured to based on the pumping signal of calculating through the description (and may based on the temporal information from one or more previous frames) of decoding at first frequency band.The example 290a of composite filter 290 be configured to produce frame on first frequency band through decoded portion (for example, narrow band signal), it is based on describing through decoding spectrum envelope and temporal information.For instance, can be according to the class value in the description of the spectrum envelope on first frequency band (for example, one or more LSP or LPC coefficient vector) and composite filter 290a is configured with in response at the pumping signal of first frequency band and produce through decoded portion.
Figure 27 B displaying spectrum envelope is described the block diagram of the embodiment 272 of demoder 270.De-quantizer 310 is configured to de-quantization is carried out in description, and inverse transformation block 320 is configured to using inverse transformation so that obtain one group of LPC coefficient through the description of de-quantization.Temporal information is described demoder 280 and also is configured to comprising de-quantizer usually.
Figure 28 A shows the block diagram of the embodiment 242 of second module 240.Second module 242 comprises that spectrum envelope describes the example 270b of demoder 270, impact damper 300 and selector switch 340.Spectrum envelope is described demoder 270b and is configured to decode to the description (for example, as receiving from parser 250) of the spectrum envelope on second frequency band.Impact damper 300 is configured to one or more descriptions to the spectrum envelope on second frequency band are stored as reference spectrum information, and selector switch 340 is configured to select from (A) impact damper 300 or (B) the describing through decoding spectrum envelope of demoder 270b according to the state of the respective value of the control signal that is produced by steering logic 210.
Second module 242 also comprises the example 290b of high-band pumping signal generator 330 and composite filter 290, described example 290b be configured to based on receive via selector switch 340 to spectrum envelope through decoding describe and produce described frame on second frequency band through decoded portion (for example, high band signal).High-band pumping signal generator 330 is configured to based on the pumping signal that produces at the pumping signal of first frequency band (for example, producing as described demoder 280a by temporal information) at second frequency band.Extraly or alternatively, generator 330 can be configured to carry out to the frequency spectrum of random noise and/or amplitude shaping operation to produce the high-band pumping signal.Generator 330 can be through being embodied as the example of high-band pumping signal generator A60 as indicated above.According to the class value in the description of the spectrum envelope on second frequency band (for example, one or more LSP or LPC coefficient vector) and to composite filter 290b be configured with produce in response to the high-band pumping signal described frame on second frequency band through decoded portion.
In an example of the embodiment of the embodiment that comprises second module 240 242 of Voice decoder 202, steering logic 210 is configured to make each value in the calling sequence all have state A or state B to selector switch 340 output binary signals.In the case, invalid if the code index of present frame indicates that it is, steering logic 210 produces and has the value of state A so, and it causes selector switch 340 to select the output of impact damper 300 (that is, selecting A).Otherwise steering logic 210 produces has the value of state B, and it causes selector switch 340 to select the output of demoder 270b (that is, selecting B).
Voice decoder 202 can be through arranging so that the operation of steering logic 210 controller buffers 300.For instance, impact damper 300 can be through arranging so that the value with state B of control signal causes the correspondence output of impact damper 300 storage decoder 270b.This control can apply control signal and implements by enable input end to writing of impact damper 300, and wherein said input end is configured to make state B corresponding to its effective status.Perhaps, steering logic 210 can be through implementing also to comprise the operation that second control signal of value sequence is come controller buffer 300 with generation, and described value sequence is based on the code index of the voice packet of encoded voice signal.
Figure 28 B shows the block diagram of the embodiment 244 of second module 240.Second module 244 comprises that spectrum envelope describes the example 280b that demoder 270b and temporal information are described demoder 280, and described example 280b is configured to decode to the description (for example, as receiving from parser 250) at the temporal information of second frequency band.Second module 244 also comprises the embodiment 302 of impact damper 300, and it also is configured to one or more descriptions to the temporal information on second frequency band are stored as reference time information.
Second module 244 comprises the embodiment 342 of selector switch 340, and it is configured to select from (A) impact damper 302 or (B) the describing and describing through decoding temporal information through decoding spectrum envelope of demoder 270b, 280b according to the state of the respective value of the control signal that is produced by steering logic 210.The example 290b of composite filter 290 be configured to produce frame on second frequency band through decoded portion (for example, high band signal), it is based on the describing through decoding spectrum envelope and temporal information that receives via selector switch 342.In the typical embodiments of the Voice decoder 202 that comprises second module 244, temporal information is described demoder 280b to be configured to produce describing through decoding temporal information, described description comprises the pumping signal at second frequency band, and according to the class value in the description of the spectrum envelope on second frequency band (for example, one or more LSP or LPC coefficient vector) and to composite filter 290b be configured with produce in response to pumping signal described frame on second frequency band through decoded portion.
Figure 28 C shows the block diagram of the embodiment 246 of second module 242 that comprises impact damper 302 and selector switch 342.Second module 246 also comprises: temporal information is described the example 280c of demoder 280, and it is configured to decode to the description at the temporal envelope of second frequency band; And gain control element 350 (for example, multiplier or amplifier), it is configured to use the description to temporal envelope that via selector switch 342 receive through decoded portion to frame on second frequency band.For to temporal envelope through decoding the situation comprise the gain shape value is described, gain control element 350 can comprise the logic that is configured to through the corresponding subframe using gain shape value of decoded portion.
Figure 28 A shows the embodiment of second module 240 to 28C, and wherein impact damper 300 receives the spectrum envelope description through complete decoding of (with (in some cases) temporal information).Similar embodiment can be through arranging so that the description that impact damper 300 receives without complete decoding.For instance, may need to reduce memory space requirements by describe (for example, as receiving) with the quantized versions storage from parser 250.In some cases, 340 signal path can be configured to comprise for example decode logic such as de-quantizer and/or inverse transformation block from impact damper 300 to selector switch.
Steering logic 210 can be through implementing to control the operation of selector switch 340 and impact damper 300 to produce single control signal.Perhaps, steering logic 210 can be through implementing to produce: (1) in order to the control signal of the operation of control selector switch 340, and its value has at least two may states; And (2) in order to second control signal of the operation of controller buffer 300, it comprise have at least two based on the value sequence of the code index of the encoded frame of encoded voice signal and its value may states.
May need Voice decoder 200 is embodied as the decoding of support to arrowband and wideband speech signal.As mentioned above, may need code device to using arrowband encoding scheme (for example, the encoding scheme 2 in the example of Figure 12) through fuzzy frame.In the case, only the code index of this voice packet may be not enough to indicate voice packet will be decoded as narrowband speech or broadband voice.If code device is configured to equally the encoded frame in arrowband is used fuzzy and burst technique, so even in same encoded frame, exist burst also may not help to indicate voice packet will be decoded as narrowband speech or broadband voice.
Therefore, may need the element (for example, steering logic 210 or extra control element) of Voice decoder 200 is configured to keep the operating value with at least two states that correspond respectively to arrowband operation and broadband operation.This element can be configured to enable based on the current state of operating value or stop using second module 240 or enable or stop using from the output through the high band portion of decoded signal of second module 240.Described element can be configured to based on following information and the state of calculating operation value: in voice packet, exist the information burst, from encoded voice signal one or more recently voice packet code index and/or from the code index of one or more subsequent voice bags of encoded voice signal.
For instance, this element can be configured under the situation of the encoding scheme of the voice packet that is used for most recent indication wideband encoding scheme the current state of operating value is set at the indication broadband operation.In another example, this element can be configured to be used under the situation of encoding scheme of broadband obfuscation the current state of operating value is set at the indication broadband operation in the indication of the code index of current speech bag.In another example, this element can be configured under following situation the current state of operating value is set at the indication broadband operation: (A) code index of current speech bag indication wideband encoding scheme or (B) the code index indication of current speech bag can be used for the encoding scheme of broadband obfuscation, current encoded frame comprises the information burst, and the encoding scheme indication wideband encoding scheme that is used for the voice packet (perhaps, at least one in the voice packet of two most recent) of most recent.In a further example, this element also can be configured under following situation the current state of operating value is set at the indication broadband operation: (C) indication of the code index of current speech bag can be used for the encoding scheme of broadband obfuscation, current encoded frame comprises the information burst, the encoding scheme indication frame erasing that is used for the voice packet of most recent, and the encoding scheme of the voice packet of second most recent indication wideband encoding scheme.
The various elements of the embodiment of Voice decoder 200 can be embodied in arbitrary combination of the hardware, software and/or the firmware that are regarded as being suitable for desired application.For instance, this class component can be fabricated to and reside on (for example) same chip or the electronics and/or the optical devices of two or more chip chambers in the chipset.An example of this device is the fixing or programmable array of logic element such as transistor or logic gate for example, and in these elements any one can be embodied as one or more this type of arrays.Can with in these elements any both or both more than or even be fully implemented in identical one or more arrays.This (a bit) array implement (for example, can be comprised in the chipset of two or more chips) in one or more chips.
Also one or more elements of the various embodiments of Voice decoder as described herein 200 can be embodied as one or more instruction sets whole or in part, described instruction set through arrange with fix at one or more of logic element (for example microprocessor, flush bonding processor, the IP kernel heart, digital signal processor, FPGA (field programmable gate array), ASSP (Application Specific Standard Product) and ASIC (special IC)) or programmable array on carry out.Also in the various elements of the embodiment of Voice decoder 200 any one (for example can be presented as one or more computing machines, comprise through the machine of programming with one or more arrays of carrying out one or more instruction sets or sequence, also be called " processor "), and can with in these elements any both or both more than or even be fully implemented in this identical (a bit) computing machine.
The various elements of the embodiment of Voice decoder 200 can be included in cellular phone for example etc. and be used for the device of radio communication or have in other device of this communication capacity.This device can be configured to communicate (for example, using for example one or more agreements of VoIP) with the network of circuit switching and/or packet switch.This device can be configured to the signal executable operations to the encoded frame of carrying, for example release of an interleave, separate perforation, to the decoding of one or more convolutional codes, the decoding of one or more error correction codes, decoding, radio frequency (RF) demodulation and/or RF to one or more procotols (for example, Ethernet, TCP/IP, cdma2000) layer are received.
One or more elements of embodiment that might make Voice decoder 200 are in order to carry out not directly related with the operation of Voice decoder task or other instruction set, for example with Voice decoder embedded wherein device or the relevant task of another operation of system.Also might make one or more elements of the embodiment of Voice decoder 200 have common structure (for example, in order at the processor corresponding to the part of different elements of different time run time version, through carrying out) to carry out corresponding to the instruction set of the task of different elements at different time or carry out the electronics of operation of different elements at different time and/or the layout of optical devices.In this type of example, steering logic 210, first module 230 and second module 240 are embodied as through arrange the instruction set to carry out on same processor.In another this type of example, spectrum envelope is described demoder 270a and 270b be embodied as the same instruction set of carrying out at different time.
It is to comprise both embodiments of speech coder 100 and Voice decoder 200 that other device that is used for the device (for example cellular phone) of radio communication or has this communication capacity can be configured.In the case, might make speech coder 100 and Voice decoder 200 have common structure.In this type of example, speech coder 100 and Voice decoder 200 be embodied as comprise through arranging instruction set on same processor, to carry out.
Provide had been in order to make any technician in affiliated field all can make or use described method and other structure disclosed herein to presenting of described configuration before.This paper shows and process flow diagram, block diagram, constitutional diagram and other structure of description only are example, and other modification of these structures also belongs in the scope of the present invention.Might make various modifications to these configurations, and the General Principle that this paper proposes can be applicable to other configuration equally.For instance, the low band portion of the following frequency of the scope of the various elements of the high band portion of the above frequency of the scope of the arrowband part that is included in voice signal that is used for processes voice signals described herein and the arrowband part that is included in voice signal that task alternately or extraly and in a similar manner is applied to processes voice signals.In the case, can use being used for of being disclosed to derive the low strap pumping signal from the arrowband pumping signal from the technology and the structure of arrowband pumping signal derivation high-band pumping signal.Therefore, the present invention is without wishing to be held to the configuration shown in above, (be included in the claims of being applied for of enclosing) principle and the novel feature the widest consistent scope that discloses in arbitrary mode but should meet with herein, described claims form the part of original disclosure.
Can with speech coder as described herein, voice coding method, Voice decoder and/or tone decoding method use together or the example of the codec that is suitable for therewith using comprises: as document 3GPP2 C.S0014-C version 1.0 " enhanced variable rate codec that is used for broadband exhibition frequency type families system; voice service option 3; 68 and 70 (Enhanced Variable Rate Codec; Speech Service Options 3; 68; and 70 for Wideband SpreadSpectrum Digital Systems) " (third generation partner program 2, (the Arlington of Arlington, Virginia, VA), the enhanced variable rate codec (EVRC) in January, 2007); As document ETSI TS 126 092V6.0.0 (ETSI European Telecommunications Standards Institute (ETSI), the many speed of adaptability (AMR) audio coder ﹠ decoder (codec) described in the French Sophia-Ang Di Minneapolis (Sophia Antipolis Cedex, FR), in Dec, 2004); And as the AMR broadband voice codec described in document ETSITS 126 192V6.0.0 (ETSI, in Dec, 2004).
Be understood by those skilled in the art that information and signal can use in multiple different skill and the technology any one to represent.For instance, the data that may mention in whole foregoing description, instruction, order, information, signal, position and symbol can be represented by voltage, electric current, electromagnetic wave, magnetic field or magnetic particle, light field or optical particle or its arbitrary combination.Be called " voice signal " though therefrom derive the signal of voice packet, though and these bags are called " voice packet ", also expection and disclose thus this signal can be during valid frame carrying music or other non-voice information content.
The those skilled in the art will further understand, and the configuration that discloses in conjunction with this paper and various illustrative logical blocks, module, circuit and the operation described can be embodied as electronic hardware, computer software or described both combination.The available general processor of this type of logical blocks, module, circuit and operation, digital signal processor (DSP), ASIC, FPGA or other programmable logic device, discrete gate or transistor logic, discrete hardware components or implement with its any combination of carrying out function described herein or carry out through design.General processor can be microprocessor, but in replacement scheme, processor can be processor, controller, microcontroller or the state machine of any routine.Processor also can be through being embodied as the combination of calculation element, for example DSP and combination, a plurality of microprocessor of microprocessor, one or more microprocessors that combine the DSP core or any other this type of configuration.
In software module that the task of method described herein and algorithm can directly be embodied in hardware, carried out by processor or described both combination.Software module can reside in the medium of any other form known in RAM storer, flash memory, ROM storer, eprom memory, eeprom memory, register, hard disk, removable dish, CD-ROM or this technology.The illustrative medium is coupled to processor, makes processor and to write information to medium from read information.In replacement scheme, medium can be integral formula with processor.Processor and medium can reside among the ASIC.ASIC can reside in the user terminal.In replacement scheme, processor and medium can be used as discrete component and reside in the user terminal.
In the configuration described herein each can be embodied as hard-wired circuit, the circuit arrangement in being fabricated onto special IC or the firmware program in being loaded into Nonvolatile memory devices at least in part or load or be loaded into software program the data storage medium as machine readable code (instruction of this category code for being carried out by array of logic elements such as for example microprocessor or other digital signal processing units) from data storage medium.Data storage medium can be the array of memory elements such as semiconductor memory for example (it can include but not limited to dynamically or static RAM (SRAM) (random access memory), ROM (ROM (read-only memory)) and/or quickflashing RAM) or ferroelectric, magnetic resistance, two-way, polymerization or phase transition storage; Or disk media such as disk or CD for example.Term " software " should be interpreted as and comprise source code, assembly language code, machine code, binary code, firmware, macrocode, microcode, any one or any combination of above instruction set or sequence and this type of example that can carry out by the array of logic element.

Claims (35)

1. the method for a processes voice signals, described method comprises:
First valid frame based on described voice signal, produce first voice packet, described first voice packet comprise to comprising of described voice signal described first valid frame part in (A) first frequency band and (B) description of the spectrum envelope on second frequency band that extends above described first frequency band; With
Second valid frame based on described voice signal, produce second voice packet, described second voice packet comprises that wherein said second voice packet does not comprise the description to the spectrum envelope on described second frequency band to the description of the spectrum envelope on described first frequency band of part of comprising of described voice signal of described second valid frame.
2. the method for processes voice signals according to claim 1, wherein said second valid frame occur follow described first valid frame closely in described voice signal after.
3. the method for processes voice signals according to claim 1, wherein said to comprising of described voice signal described first valid frame the description of spectrum envelope of part comprise that independent first and second describe, wherein said first to describe be the description of the spectrum envelope on described first frequency band of part to comprising of described voice signal of described first valid frame, and wherein said second to describe be the description of the spectrum envelope on described second frequency band of part to comprising of described voice signal of described first valid frame.
4. the method for processes voice signals according to claim 1, at least two hundred hertz of wherein said first and second band overlappings.
5. the method for processes voice signals according to claim 1, wherein said method comprise and produce encoded frame, and described encoded frame contains the burst of the information signal that (A) described second voice packet and (B) separate with described voice signal.
6. the method for processes voice signals according to claim 1, the length of wherein said burst is less than the length of described second voice packet.
7. the method for processes voice signals according to claim 1, the described length of wherein said burst equals the described length of described second voice packet.
8. the method for processes voice signals according to claim 1, the described length of wherein said burst is greater than the described length of described second voice packet.
9. equipment that is used for processes voice signals, described equipment comprises:
Be used for producing the device of first voice packet based on first valid frame of described voice signal, described first voice packet comprise to comprising of described voice signal described first valid frame part in (A) first frequency band and (B) description of the spectrum envelope on second frequency band that extends above described first frequency band; With
Be used for based on second valid frame of described voice signal and produce the device of second voice packet, described second voice packet comprises the description of the spectrum envelope on described first frequency band of part to comprising of described voice signal of described second valid frame,
Wherein said second voice packet does not comprise the description to the spectrum envelope on described second frequency band.
10. computer program, it comprises computer-readable media, and described medium comprise:
Be used to cause at least one computer based to produce the code of first voice packet in first valid frame of described voice signal, described first voice packet comprise to comprising of described voice signal described first valid frame part in (A) first frequency band and (B) description of the spectrum envelope on second frequency band that extends above described first frequency band; With
Be used to cause at least one computer based in second valid frame of described voice signal and produce the code of second voice packet, described second voice packet comprises the description of the spectrum envelope on described first frequency band of part to comprising of described voice signal of described second valid frame
Wherein said second voice packet does not comprise the description to the spectrum envelope on described second frequency band.
11. a speech coder, described speech coder comprises:
Packet encoder, it is configured to (A) and produces first voice packet based on first valid frame of voice signal and in response to first state of rate controlled signal, described first voice packet comprises the description at (1) first frequency band and (2) spectrum envelope on second frequency band that extends above described first frequency band, and (B) produce second voice packet based on second valid frame of described voice signal and in response to second state of described first state of being different from of described rate controlled signal, described second voice packet comprises the description to the spectrum envelope on described first frequency band; With
Frame formatter, it is through arranging to receive described first and second voice packets, and be configured to (A) and produce first encoded frame that contains described first voice packet and the second encoded frame that (B) produces the burst that contains described second voice packet and the information signal that separates with described voice signal in response to first state of obfuscation control signal in response to second state of described first state of being different from of described obfuscation control signal
The wherein said first and second encoded frames have equal length, and described first voice packet occupies eight ten at least percent of the described first encoded frame, and described second voice packet occupy the described second encoded frame be no more than half, and
Wherein said second valid frame occurs follow described first valid frame closely in described voice signal after.
12. the method for a processed voice bag, described method comprises:
Based on from the information from first voice packet of encoded voice signal, obtain to first frame of voice signal at (A) first frequency band and (B) be different from the description of the spectrum envelope on second frequency band of described first frequency band;
Based on from the information from second voice packet of described encoded voice signal, obtain description to the spectrum envelope on described first frequency band of second frame of described voice signal;
Based on information, obtain description to the spectrum envelope on described second frequency band of described second frame from described first voice packet; With
Based on information, obtain the information relevant with the tonal components at described first frequency band of described second frame from described second voice packet.
13. the method for processed voice bag according to claim 12, the description of the spectrum envelope of wherein said first frame to voice signal comprise to described first frame in the description of the spectrum envelope on described first frequency band with to the description of the spectrum envelope on described second frequency band of described first frame.
14. the method for processed voice bag according to claim 12, the relevant information of the tonal components at described first frequency band of wherein said and described second frame comprises tone laging value.
15. the method for processed voice bag according to claim 12, wherein said method comprise the pumping signal at described first frequency band based on relevant described second frame of information calculations of the tonal components at described first frequency band of described and described second frame.
16. the method for processed voice bag according to claim 15, wherein said calculating pumping signal based on at the relevant information of second tonal components of described first frequency band, and
The wherein said information relevant with second tonal components is based on the information from described first voice packet.
17. the method for processed voice bag according to claim 15, wherein said method comprise the pumping signal at described second frequency band of calculating described second frame based on the described pumping signal at described first frequency band of described second frame.
18. the method for processed voice bag according to claim 12, wherein said method comprises the burst that obtains the information signal that separates with described voice signal from the encoded frame of described encoded voice signal, and wherein said encoded frame comprises described second voice packet.
19. an equipment that is used for the processed voice bag, described equipment comprises:
Be used for based on from one from the information acquisition of first voice packet of encoded voice signal to first frame of voice signal at (A) first frequency band and (B) be different from the device of the description of the spectrum envelope on second frequency band of described first frequency band;
Be used for based on from one from the information acquisition of second voice packet of described encoded voice signal device to the description of the spectrum envelope on described first frequency band of second frame of described voice signal;
Be used for based on from the information acquisition of described first voice packet device the description of the spectrum envelope on described second frequency band of described second frame; With
Be used for based on device from the information acquisition of described second voice packet information relevant with the tonal components at described first frequency band of described second frame.
20. the equipment that is used for the processed voice bag according to claim 19, the description of the spectrum envelope of wherein said first frame to voice signal comprises independent first and second and describes, wherein said first description is the description to the spectrum envelope on described first frequency band of described first frame, and wherein said second description is the description to the spectrum envelope on described second frequency band of described first frame.
21. the equipment that is used for the processed voice bag according to claim 19, the relevant information of the tonal components at described first frequency band of wherein said and described second frame comprises tone laging value.
22. the equipment that is used for the processed voice bag according to claim 19, wherein said equipment comprises the device at the pumping signal of described first frequency band that is used for based on relevant described second frame of information calculations of the tonal components at described first frequency band of described and described second frame, and
Wherein said equipment comprises the device at the pumping signal of described second frequency band that is used for calculating based on the described pumping signal at described first frequency band of described second frame described second frame.
23. the equipment that is used for the processed voice bag according to claim 19, wherein said equipment comprises the device of the burst that is used for the information signal that separates with described voice signal based on the information acquisition from the encoded frame of described encoded voice signal, and wherein said encoded frame comprises described second voice packet.
24. a computer program, it comprises computer-readable media, and described medium comprise:
Be used to cause at least one computer based in from one obtain from the information of first voice packet of encoded voice signal to first frame of voice signal at (A) first frequency band and (B) be different from the code of the description of the spectrum envelope on second frequency band of described first frequency band;
Be used to cause at least one computer based in from one from the information of second voice packet of described encoded voice signal and obtain code to the description of the spectrum envelope on described first frequency band of second frame of described voice signal;
Be used to cause at least one computer based in from the information of described first voice packet and obtain code to the description of the spectrum envelope on described second frequency band of described second frame; With
Be used to cause at least one computer based in from the information of described second voice packet and obtain the code of the information relevant with the tonal components at described first frequency band of described second frame.
25. computer program according to claim 24, the description of the spectrum envelope of wherein said first frame to voice signal comprises independent first and second and describes, wherein said first description is the description to the spectrum envelope on described first frequency band of described first frame, and wherein said second description is the description to the spectrum envelope on described second frequency band of described first frame.
26. computer program according to claim 24, the relevant information of the tonal components at described first frequency band of wherein said and described second frame comprises tone laging value.
27. computer program according to claim 24, wherein said medium comprise and are used to cause at least one computer based to calculate the code at the pumping signal of described first frequency band of described second frame in the relevant information of the tonal components at described first frequency band of described and described second frame, and
Wherein said medium comprise be used to cause at least one computer based in described second frame at the described pumping signal of described first frequency band and calculate the code at the pumping signal of described second frequency band of described second frame.
28. computer program according to claim 24, wherein said medium comprise and are used to cause at least one computer based in from the information of the encoded frame of described encoded voice signal and calculate the code of the burst of the information signal that separates with described voice signal, and wherein said encoded frame comprises described second voice packet.
29. a Voice decoder, it is configured to calculate through decodeing speech signal based on encoded voice signal, and described Voice decoder comprises:
Steering logic, it is configured to produce the control signal that comprises value sequence, and described value sequence is based on the code index from the voice packet of described encoded voice signal, and each value in the described sequence is corresponding to the described frame period through decodeing speech signal; With
Packet decoder, it is configured to
(A) calculate based on following description in response to the value with first state of described control signal corresponding to decoded frame: to described description at (1) first frequency band and (2) spectrum envelope on second frequency band that extends above described first frequency band through decoded frame, described description is based on from the information from the voice packet of described encoded voice signal, and
(B) in response to having of described control signal be different from described first state second state value and calculate corresponding through decoded frame based on following description: (1) is to the description of the described spectrum envelope on described first frequency band through decoded frame, described description is based on from the information from the voice packet of described encoded voice signal, (2) to the description of the described spectrum envelope on described second frequency band through decoded frame, described description is based on the information of coming to come across in the comfortable described encoded voice signal at least one voice packet before the described voice packet.
30. Voice decoder according to claim 29, wherein saidly the described description at (1) first frequency band and (2) spectrum envelope on second frequency band that extends above described first frequency band through decoded frame is comprised independent first and second describe, wherein said first to describe be description to the described spectrum envelope on described first frequency band through decoded frame, and wherein said second to describe be description to the described spectrum envelope on described second frequency band through decoded frame.
31. Voice decoder according to claim 29, the relevant information of the tonal components at described first frequency band of wherein said and described second frame comprises tone laging value.
32. Voice decoder according to claim 29, wherein said packet decoder is configured in response to the value with second state of described control signal and based on the pumping signal at described first frequency band of relevant described second frame of information calculations of the tonal components at described first frequency band of described and described second frame, and
Wherein said equipment comprises the device at the pumping signal of described second frequency band that is used for calculating based on the described pumping signal at described first frequency band of described second frame described second frame.
33. Voice decoder according to claim 16, wherein said equipment comprises and is used for based on from the information of the encoded frame of described encoded voice signal and obtain the device of the burst of the information signal that separates with described voice signal, and wherein said encoded frame comprises second voice packet.
34. the method for a processes voice signals, described method comprises:
Based on first frame of described voice signal, produce the rate selection signal of indication wideband encoding scheme;
Based on information, produce the obfuscation control signal from the mask file;
Based on state, cross described wideband encoding Scheme Choice and select the arrowband encoding scheme corresponding to the described obfuscation control signal of described first frame; With
According to described arrowband encoding scheme described first frame is encoded.
35. the method for processes voice signals according to claim 34, wherein said according to described arrowband encoding scheme described first frame is encoded to comprise described first frame is encoded to first voice packet, and
Wherein said method comprises the encoded frame of generation, and described encoded frame comprises the burst of described first voice packet and the information signal that separates with described voice signal.
CN2007800280941A 2006-07-31 2007-07-31 Systems, methods, and apparatus for wideband encoding and decoding of active frames Active CN101496099B (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US83468306P 2006-07-31 2006-07-31
US60/834,683 2006-07-31
US11/830,842 US8532984B2 (en) 2006-07-31 2007-07-30 Systems, methods, and apparatus for wideband encoding and decoding of active frames
US11/830,842 2007-07-30
PCT/US2007/074868 WO2008016925A2 (en) 2006-07-31 2007-07-31 Systems, methods, and apparatus for wideband encoding and decoding of active frames

Related Child Applications (2)

Application Number Title Priority Date Filing Date
CN201110243169.6A Division CN102324236B (en) 2006-07-31 2007-07-31 Systems, methods, and apparatus for wideband encoding and decoding of active frames
CN201110243186XA Division CN102385865B (en) 2006-07-31 2007-07-31 Systems, methods, and apparatus for wideband encoding and decoding of active frames

Publications (2)

Publication Number Publication Date
CN101496099A true CN101496099A (en) 2009-07-29
CN101496099B CN101496099B (en) 2012-07-18

Family

ID=40925464

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2007800280941A Active CN101496099B (en) 2006-07-31 2007-07-31 Systems, methods, and apparatus for wideband encoding and decoding of active frames

Country Status (2)

Country Link
CN (1) CN101496099B (en)
TW (1) TWI343560B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102870156A (en) * 2010-04-12 2013-01-09 飞思卡尔半导体公司 Audio communication device, method for outputting an audio signal, and communication system
CN103229544A (en) * 2010-12-03 2013-07-31 瑞典爱立信有限公司 Source signal adaptive frame aggregation
CN106448688A (en) * 2014-07-28 2017-02-22 华为技术有限公司 Audio coding method and related device
CN106683681A (en) * 2014-06-25 2017-05-17 华为技术有限公司 Method and device for processing lost frames
CN107408392A (en) * 2015-04-05 2017-11-28 高通股份有限公司 Audio bandwidth selects
US10614817B2 (en) 2013-07-16 2020-04-07 Huawei Technologies Co., Ltd. Recovering high frequency band signal of a lost frame in media bitstream according to gain gradient

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3671741A1 (en) 2018-12-21 2020-06-24 FRAUNHOFER-GESELLSCHAFT zur Förderung der angewandten Forschung e.V. Audio processor and method for generating a frequency-enhanced audio signal using pulse processing

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102870156A (en) * 2010-04-12 2013-01-09 飞思卡尔半导体公司 Audio communication device, method for outputting an audio signal, and communication system
CN102870156B (en) * 2010-04-12 2015-07-22 飞思卡尔半导体公司 Audio communication device, method for outputting an audio signal, and communication system
CN103229544A (en) * 2010-12-03 2013-07-31 瑞典爱立信有限公司 Source signal adaptive frame aggregation
CN103229544B (en) * 2010-12-03 2016-08-17 瑞典爱立信有限公司 Source signal adaptive frame is polymerized
US10614817B2 (en) 2013-07-16 2020-04-07 Huawei Technologies Co., Ltd. Recovering high frequency band signal of a lost frame in media bitstream according to gain gradient
CN106683681A (en) * 2014-06-25 2017-05-17 华为技术有限公司 Method and device for processing lost frames
CN106448688A (en) * 2014-07-28 2017-02-22 华为技术有限公司 Audio coding method and related device
US10269366B2 (en) 2014-07-28 2019-04-23 Huawei Technologies Co., Ltd. Audio coding method and related apparatus
US10504534B2 (en) 2014-07-28 2019-12-10 Huawei Technologies Co., Ltd. Audio coding method and related apparatus
US10706866B2 (en) 2014-07-28 2020-07-07 Huawei Technologies Co., Ltd. Audio signal encoding method and mobile phone
CN107408392A (en) * 2015-04-05 2017-11-28 高通股份有限公司 Audio bandwidth selects
CN107408392B (en) * 2015-04-05 2021-07-30 高通股份有限公司 Decoding method and apparatus

Also Published As

Publication number Publication date
CN101496099B (en) 2012-07-18
TWI343560B (en) 2011-06-11
TW200830278A (en) 2008-07-16

Similar Documents

Publication Publication Date Title
CN102324236B (en) Systems, methods, and apparatus for wideband encoding and decoding of active frames
CN101496100B (en) Systems, methods, and apparatus for wideband encoding and decoding of inactive frames
CN101681627B (en) Signal encoding using pitch-regularizing and non-pitch-regularizing coding
CN101523484B (en) Systems, methods and apparatus for frame erasure recovery
ES2318820T3 (en) PROCEDURE AND PREDICTIVE QUANTIFICATION DEVICES OF THE VOICE SPEECH.
CN101496098B (en) Systems and methods for modifying a window with a frame associated with an audio signal
JP4971351B2 (en) System, method and apparatus for detection of tone components
CN101496099B (en) Systems, methods, and apparatus for wideband encoding and decoding of active frames
RU2421828C2 (en) Systems and methods for including identifier into packet associated with speech signal
CN1703737B (en) Method for interoperation between adaptive multi-rate wideband (AMR-WB) and multi-mode variable bit-rate wideband (VMR-WB) codecs
CN1820306B (en) Method and device for gain quantization in variable bit rate wideband speech coding
CN101622666A (en) Non-causal postfilter
CN106133832A (en) The Apparatus and method for of decoding technique is switched at device
Gibson Speech coding for wireless communications
KR20080091305A (en) Audio encoding with different coding models

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant