CN100527225C - A transcoding scheme between CELP-based speech codes - Google Patents

A transcoding scheme between CELP-based speech codes Download PDF

Info

Publication number
CN100527225C
CN100527225C CNB038055198A CN03805519A CN100527225C CN 100527225 C CN100527225 C CN 100527225C CN B038055198 A CNB038055198 A CN B038055198A CN 03805519 A CN03805519 A CN 03805519A CN 100527225 C CN100527225 C CN 100527225C
Authority
CN
China
Prior art keywords
celp
parameter
codec
module
source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB038055198A
Other languages
Chinese (zh)
Other versions
CN1701353A (en
Inventor
M·A·贾布里
J·王
S·戈徳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Di Lee Sim (for the benefit of creditors) Ltd.
Di Lee Sim Network Inc.
Dilithium Networks Inc
Original Assignee
Dilithium Networks Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dilithium Networks Inc filed Critical Dilithium Networks Inc
Publication of CN1701353A publication Critical patent/CN1701353A/en
Application granted granted Critical
Publication of CN100527225C publication Critical patent/CN100527225C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/173Transcoding, i.e. converting between two coded representations avoiding cascaded coding-decoding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/12Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders

Abstract

Transcoding a CELP based compressed voice bitstream from source codec to destination codec relate to embodiments of a system and method. The method includes processing a source codec input bitstream to unpack CELP parameters from the input CELP bistream and may interpolate the unpacked CELP parameters from is a difference of destination codec parameters and source codec parameters exists. If the method maps CELP from source codec format to a destination codec format, the parameter mapping strategy may be singly preset or selected. The method inludes encoding the CELP parameters for the destination codec and processing a destination CELP bitstream by packing the CELP parameters for the destination codec.

Description

Based on the code conversion scheme between the phonetic code of CELP
The cross reference of relevant application
The application require the preference of the following U.S. Provisional Application own together, respectively be proposed on January 8th, 2002 60/347,270,60/364 of proposition on March 12nd, 2002,403,60/421 of proposition on October 25th, 2002,446,60/421 of proposition on October 25th, 2002,60/421 of proposition on October 25th, 449 and 2002,270, here in practical application in conjunction with as a reference.
Under research that federal government subsidizes or exploitation, make the statement of invention right
Inapplicable
With reference to " sequence list ", a kind of form of on CD, submitting to or computer program tabulation appendix
Inapplicable
Background of invention
The present invention relates generally to some technology of process information.Especially, the invention provides a kind of method and apparatus, be used for changing the CELP frame to another based on the standard of CELP and/or in the still different pattern of single standard from a standard based on CELP.In whole instructions, especially below, provide further detailed description of the present invention.
Coding is that original signal (speech, image, video etc.) is converted to the process that can admit transmission or formats stored.Usually coding causes a large amount of compressions, but generally comprising important signal Processing reaches.The result of coding is the bit stream (sequence of frame) according to the encoded parameter of given compressed format.The redundant information that makes signal become the various technology of model to remove in statistical and the perception by use obtains compression.Therefore encoded form is called " compressed format " or " parameter space ".Demoder is obtained compressed bit stream, and produces original signal again.In the situation of voice coding, compression generally causes information dropout.
The process of the bit rate of coded signal is code conversion before the process of known conversion between different compressed formats and/or the minimizing.Bandwidth be can so save, or incompatible client and/or server unit connected.Code conversion can not be visited original signal to visit compressed signal only with the different transcoders (transcoder) that are of directly compression processing.
Use level and smooth (brute force) technology (it has the recompression process of following removing the compression process back) such as " cascade " can the completion code conversion.Because need a large amount of processing usually and may postpone,, can consider the code conversion in compression stroke or the parameter space signal is removed compression and then compression.This code conversion helps the mapping between the compressed format, is retained in simultaneously in any possible parameter space.The time marquis that " intelligence (smart) " code conversion algorithm of complexity that Here it is begins to work.Though progressive to some extent in code conversion, wish further to improve the code conversion technology.In whole instructions, especially below, will further describing of restriction in the conventional art be described more completely.
Brief summary of the invention
According to the present invention, provide some technology of process information.Especially, the invention provides a kind of method and apparatus, be used for from a kind of standard based on CELP to another kind based on the standard of CELP and/or in single standard but different patterns is carried out the conversion of frame.In whole instructions, especially below, will provide of the present invention further describing.
In a certain embodiments, the invention provides a kind of equipment, be used for from a kind of standard based on CELP to another kind based on the standard of CELP and/or in single standard but different patterns is carried out the conversion of frame.This equipment has a bit stream and removes package module, is used for obtaining one or more CELP parameters from a source codec.This equipment also has an interpolator module that is coupled to bit stream removal package module.Interpolator module is applicable to the interpolation between the sampling rate of different frame size, subframe size and/or source codec and purpose codec.A mapping block is coupled to interpolator module.Mapping block is applicable to from one or more CELP parameter maps of source codec one or more CELP parameters to the purpose codec.This equipment has the purpose bit stream package module that is coupled to mapping block.Purpose bit stream package module is applicable to according at least one or a plurality of CELP parameter from the purpose codec and constitutes at least one purpose output CELP frame.A controller is coupled to purpose bit stream package module, mapping block, interpolator module and bit stream at least removes package module.Best, controller is applicable to the operation of the one or more modules of management, and is applicable to the instruction of reception from one or more external applications.Controller is applicable to status information is offered one or more external applications.
In other specific embodiment, the invention provides a kind of method, be used for carry out code conversion based on the compressed voice bitstream of CELP from the source codec to the purpose codec.This method comprises that process source codec input CELP bit stream makes it at least one or a plurality of CELP parameter from input CELP bit stream are removed encapsulation, comprise frame sign if exist, subframe size, and/or one or more in a plurality of purpose codecs parameter of the sampling rate of purpose codec format and comprise frame sign, subframe size, and/or the one or more difference in the multiple source codecs parameter of the sampling rate of source codec format, then one or more the CELP parameter of a plurality of removals encapsulation from the source codec format to purpose codec format interpolation.This method comprises encodes to one or more CELP parameters of purpose codec, and by the one or more CELP parameters that encapsulate the purpose codec at least processing intent CELP bit stream.
In other specific embodiment, the invention provides a kind of method, be used for carry out processing based on the compressed voice bitstream of CELP from the source codec to the purpose codec format.This method comprises in a plurality of control signals of self-application program process always and transmits a control signal, and at least according to from a plurality of different CELP mapping policys, selecting a CELP mapping policy from the control signal of application program.This method also comprises uses selected CELP mapping policy to carry out mapping process, one or more CELP parameters are mapped to one or more CELP parameters of purpose codec format from the source codec format.
Further again, the invention provides a kind of system, be used for carry out processing based on the compressed voice bitstream of CELP from the source codec to the purpose codec format.This system comprises one or more storeies.Sort memory can comprise one or more codes, and being used for always, a plurality of control signals of self-application program process receive a control signal.Also comprise one or more codes, be used for basis and select a CELP mapping policy from the control signal of application program from a plurality of different CELP mapping policys at least.One or more storeies also comprise one or more codes, be used to use selected CELP mapping policy to carry out mapping process, one or more CELP parameters are mapped to one or more CELP parameters of purpose codec format from the source codec format.According to embodiment, can also be useful on carry out function described herein and this explanation outside, other computer code of the function that can combine with the present invention.
Use the present invention to obtain many benefits.According to embodiment, can obtain one or more in these benefits.
Reduce the computational complexity of transcode process.
Minimizing is by the delay of transcode process.
The quantity of the storer that the minimizing code conversion needs.
The control of introducing dynamic rate.
Support quiet (silence) frame by the speech activity detector that embeds.
The framework that can use various parameter maps strategies is provided.
Provide general code conversion foundation structure to adapt to current and codec in the future based on diversity CELP.
Code conversion invention can obtain one or more in these benefits.In a particular embodiment, code conversion equipment comprises:
Source CELP parameter is removed package module, and it obtains the CELP parameter from input coding CELP bit stream;
CELP parameter interpolate device, it becomes purpose CELP parameter to input source CELP Parameters Transformation, and this purpose CELP parameter is corresponding to the subframe size difference between the source and destination codec; If the sub-Frame size of source and destination codec is different, then the operation parameter interpolation.
Purpose CELP parameter maps and tuning engine, it is transformed into purpose CELP codecs parameter to the CELP parameter from described interpolator module;
Purpose CELP code wrapper, it is encapsulated into the CELP parameter through mapping in the purpose CELP code frame;
The advanced features manager, its management is in CELP one optional feature and the feature the in-CELP code conversion;
Controller, it manages whole transcode process;
The status report function, it provides the state of transcode process.
It is the CELP demoder that does not have the simplification of format filter and back-wave filter that source CELP parameter is removed package module.
CELP parameter interpolate device comprises one group of interpolater with one or more CELP relating to parameters.
Purpose CELP parameter maps and tuner module comprise parameter maps strategy handover module, and one or more in the following parameters mapping policy a: module of CELP parameter straight space mapping, analyze a module of excitation space mapping, analyze a module through the excitation space mapping of filtering.
The present invention is the run time version conversion on the basis of a sub-Frame of a sub-Frame.That is, when the code converting system received (the source compressed information) frame, transcoder can begin to operate thereon, and produced the sub-Frame of output.In case produced the sub-Frame of sufficient amount, just can produce (according to the compressed information of purpose form) frame, and if communication be purpose, just can send to communication channel.If storage is a purpose, then can store the frame that is produced on demand.If the extended period of the frame by source and destination format standard definition is identical, then single incoming frame will produce single output frame, otherwise will need to cushion other incoming frame, or produce a plurality of output Frame.Extended period as fruit Frame is different, then will need the interpolation between the sub-Frame parameter.Therefore, code conversion operation comprises four kinds of operations: (1) bit stream is removed encapsulation, the interpolation of (2) sub-Frame buffering and source CELP parameter, (3) mapping and be tuned to purpose CELP parameter, and (4) encapsulation code is with the generation output frame.
So when receiving frame, the encapsulation of transcoder removal bit stream is included in the CELP parameter (Figure 10, square frame (1)) of each the sub-Frame in the frame with generation.Parameters of interest is that LPC coefficient, excitation (producing from self-adaptation and fixed codeword) and pitch lag behind.Note, for the low-complexity solution that produces excellent quality, only need be to excitation rather than to decoding whole the synthesizing of speech waveform.Sub-if desired Frame interpolation is then finished by intelligent interpolation engine at this moment.
Present sub-Frame is in a kind of form, and this form can be admitted by the processing of purpose parameter maps and tuner module (Figure 10, square frame (5)).Be independent of excitation CELP parameter and shine upon short-term LPC filter coefficient.Can use the simple linear mapping in LSP puppet-frequency space, be used for the LSP coefficient of purpose codec with generation.Many methods that can correspondingly provide better quality output by the cost of computational complexity are shone upon excitation CELP parameter.In presents, described three kinds of so mapping policys, and be the part (Figure 10, square frame (4)) of mapping and tuning policy module:
CELP parameter straight space mapping (DSM);
Analyze the excitation space territory;
Analysis is through the excitation space territory of filtering
The selection of mapping and tuning strategy is by mapping and tuning tactful handover module (Figure 10, square frame (3)).
Because three kinds of methods are to quality tradeoffs in order to reduce computation burden, thus at equipment owing under the situation that a large amount of simultaneously channels transship, can use them, so that appropriate degrading to be provided aspect the quality.Therefore the performance of code converter can be adapted to available resource.On the other hand, can use the quality of only generation requirement and a kind of strategy of performance to construct transcoding system.In this case, will be not in conjunction with mapping and tuning tactful handover module (Figure 10, square frame (3)).
If can be applicable to the purpose standard, then can also use speech activity detector (in parameter space, operating) to reduce the bandwidth of output this moment.Then, can be encapsulated into (Figure 10, square frame (7)) in the purpose bitstream format frame to parameter, and produce and be used for sending or storage through mapping.
The present invention includes and be used between based on the voice coding standard of CELP, carrying out algorithm and the method that smart code is changed.The present invention also comprises the code conversion in the single standard, so that carry out rate controlled (arriving than low mode or the speech activity detector introducing silent frames by embedding by code conversion).
Manage whole transcode process (Figure 10, square frame (8)) by control module, described control module sends order according to the state and the external command of code conversion.
In order to adapt to different code conversion requirements, equipment of the present invention provides the possibility (Figure 10, square frame (6)) of adding optional feature and function.
From description below in conjunction with accompanying drawing, will be more clear to other features and advantages of the present invention, in all accompanying drawings, do corresponding identification with identical mark.
The accompanying drawing summary
Special statement believes it is novel purpose of the present invention, feature and advantage in appending claims.By with reference to following explanation, can with further purpose and advantage understanding preferably arranged to invention aspect tissue of the present invention and the mode of operation two together with accompanying drawing.
Fig. 1 is the simplified block diagram of the decoder level of general celp coder;
Fig. 2 is the simplified block diagram of the encoder level of general celp coder;
Fig. 3 is the simplified block diagram that the mathematical model of codec is shown;
Fig. 4 is the simplified block diagram that the mathematical model of cascade conversion codec (transcodec) is shown;
Fig. 5 is the simplified block diagram that the mathematical model of intelligent conversion codec is shown;
The explanation that Fig. 6 one of is based in the legacy equipment of code conversion of CELP;
The explanation that Fig. 7 one of is based in the legacy equipment of code conversion of CELP;
Fig. 8 is a simplified block diagram, and the general code conversion between the CELP codec is shown;
Fig. 9 is a simplified block diagram, illustrates to be used for GSM-AMR and sub-Frame interpolation G.723.1;
The simplified block diagram of the system that Figure 10 describes to constitute according to one embodiment of present invention is with convert the output CELP bit stream of purpose codec to from the input CELP bit stream code of source CELP codec;
Figure 11 is that source codec CELP parameter is removed the more detailed simplified block diagram of package module;
Figure 12 is a simplified block diagram, and the interpolation for a subframe that G.723.1 arrives GSM-AMR and a sampling parameters of a sampling is shown;
Figure 13 is a simplified block diagram, and the excitation by the encoded LPC coefficient correction of source codec LPC coefficient and purpose codec is shown;
Figure 14 is a simplified block diagram, and the parameter maps and the tuner module of more detailed CELP parameter maps is shown;
Figure 15 is the simplified block diagram of more detailed purpose CELP parameter tuner module;
Figure 16 is a simplified block diagram, and the embodiment that is encapsulated in the purpose CELP code in GSM-AMR frame is shown;
Figure 17 describes G.723.1 to arrive an embodiment of GSM-AMR transcoder; And
Figure 18 describes GSM-AMR and arrives a G.723.1 embodiment of transcoder.
Detailed description of the present invention
According to the present invention, provide the technology of process information. Especially, the invention provides a kind of method and apparatus, Be used for from a kind of standard based on CELP to another kind based on the standard of CELP and/or in single standard but It is conversion CELP frame in the different pattern. In whole specification, especially below, provide the present invention Further detailed description.
The present invention includes for carry out based on the coding method of CELP (Code Excited Linear Prediction) and standard it Between algorithm and the method for code conversion. Most interested is by such as International Telecommunications Union (ITU) or The CELP coding method of the group norms of ETSI (ETSI) and so on. The present invention also Be included in the code conversion in the single standard, (arrive low mould by code conversion in order to carry out speed control Formula or the speech activity detector introducing silent frames by embedding).
Generally can speech coding technology be categorized into wave coder (for example, from ITU G.711, G.726, standard G.722) and the encoder by synthesis analysis (AbS) type (for example, from ITU G.723.1 and G.729 standard, and from the GSM-AMR standard of ETSI and from telecommunications Enhanced variable rate codec (EVRC) standard of TIA (TIA), selectable modes sound Code device (SMV) standard). Wave coder operates in time-domain, and they are based on a sampling one The method of individual sampling, the method are utilized the correlation between the phonetic sampling. Encoder by synthesis analysis Trial is imitated human language by the model in the source (glottis) of simplification and the model of wave filter (voice range) Sound produces system, and these models are formed on the frame basis (frame signs that normal operation is 10-30 milliseconds) The output voice spectrum.
Introduce the encoder by the synthesis analysis type, provide high-quality speech by low bit rate, to increase The amount of calculation that needs is cost. Compress technique is to save a kind of eloquent side of resource in the communication interface Method.
On mathematics, all audio coder ﹠ decoder (codec)s are all used One-dimensional simulation voice signal x0(1) starts, this signal is taken a sample unchangeably and quantized, to obtain the numeric field expression, x (n)=Q (x0(nT)). Voice signal Sampling rate f=1/T generally is 8kHz or 16kHz, and generally sampled signal is quantized to maximum 16-bit.
Then, can consider codec based on CELP as a kind of algorithm, this algorithm uses voice Production model shines upon between the voice x (n) through taking a sample and some parameter space θ, that is, it is spoken to numeral Sound carries out Code And Decode. All algorithms based on CELP all (can further be divided into frame at speech frame Several subframes) the upper operation. In some codec, speech frame is overlapped. Can be fixed speech frame Justice is the vector of the phonetic sampling that begins at n sometime, that is,
x ~ i = x ( n ) x ( n + 1 ) · · · x ( n + L - 1 ) T
Wherein, L is the length (number of samples) of speech frame. Notice that frame index i and the first frame sample n have linearity Relation,
IL is for non-overlapped frame
n={
I (L-K) is for overlapping frame.
Wherein K is the overlapping sampling number between the frame.
Now, compression (lossy coding) process is speech frameBe mapped to parameter θiA kind of function, and decode procedure is from parameter θiShine upon back the raw tone frameApproximation. Voice by the decoder generation The speech frame of frame and original coding is unequal. The design codec with on the sensigenous as far as possible with defeated Enter the similar output voice of voice, that is, when processing parameter, encoder must produce so parameter, These parameters make the input speech frame and measure by some sensation level between the speech frame of decoder generation The value maximization.
Generally, from being input to parameter, the mapping from parameter to output needs input or the parameter before all Knowledge. For example, this can make by the method that the state S in the codec is kept at based on CELP With the structure of self adaptation code book in and obtain. Must synchronously preserve coder state and decoder shape Attitude. By the data that only have according to both sides' (encoder), that is, parameter is upgraded shape Attitude just can reach this point. Fig. 3 illustrates the universal model of encoder, channel and decoder.
The frame parameter θ that in the model based on CELP, usesiComprise for voice signal (physically with Voice range, oral cavity and nasal cavity and lip are relevant) the linear predictor coefficient (LPC) of short-term forecast, and The pumping signal that is consisted of by self adaptation and fixed code. Form long-term in the voice with adaptive code The model of tone information. Code (self adaptation and fixing) has the code book that is associated, and this code book is Predefined for specific CELP codec. Fig. 1 illustrates typical CELP decoder, wherein passes through Gain factor is calibrated independently to self adaptation and this vector of fixed password, then, makes up and filters Ripple is to produce synthetic voice. Usually these voice are by a rear wave filter, and model introduces to remove Artefact.
CELP coding (analysis) process comprises voice signal is carried out preliminary treatment to remove shown in figure 2 Remove the unwanted frequency component, and use a window function, then obtain short-term LPC parameter. This Normal operation Levinson-Durbin algorithm is finished. The LPC parameter is converted to the line frequency spectrum to (Line Spectral Pairs (LSP)), to promote quantification and subframe interpolation. Then, by short-term LPC wave filter Make voice anti--filtering, to produce the residual excitation signal. This residue is carried out appreciable weighting, to carry High-quality, and analyze, to seek the estimated value of speech tone. Use one of closed loop to analyze one The method of individual analysis is determined optimum tone. In case find tone, just from the residue subtract currentless from Adapt to the code book component, and find optimum fixed codeword. The memory of new encoder inside more, with The change of reflection codec states (such as the self adaptation code book).
The simplest method of code conversion is the smoothing method that is called as the cascade code conversion, sees Fig. 4. This Individual method is carried out completely decoding to the compressed bit of input, to produce synthetic voice. Then, With target criteria synthetic voice are encoded. This method suffers from: signal is encoded again A large amount of calculating, and the Quality Down problem of introducing from pre-one and rear-filtering of speech waveform, and logical The eyes front of crossing encoder requires the potential delay of (look-ahead-requirements) introducing.
The method that " intelligence " code conversion similar to method illustrated in fig. 5 in article, occurred. Yet these methods basically are still and construct voice signal again, then, carry out extensive work and obtain each Plant the CELP parameter, such as LPC and tone. That is, these methods still operate in the voice signal space. Especially, only use pumping signal for the generation of synthetic speech, this pumping signal is encoded by far-end Device (at the encoder of far-end, this far-end has produced compressed voice according to a kind of compressed format) Optimally mate with raw tone. Then, calculate new optimal excitation with synthetic voice. Because In conjunction with the requirement of impulse response filter operation, this becomes calculating strength and operates greatly in closed-loop search. Fig. 6 illustrates US 6,260, the method that 009B1 uses. From input stimulus parameter and the filter of output quantization formant Ripple device coefficient produces the again signal of structure as the echo signal use by searcher. Since the source and Difference between the formant filter coefficient of the quantification in the purpose codec, this causes the searcher target Degrading in the signal, last, reduce widely from the output voice quality of code conversion. See Fig. 6. In whole the specification, especially below, can find other restrictions.
Fig. 7 illustrates another kind " intelligence " code conversion method. Announced (US2002/0077812 A1). This method is carried out by the reciprocation between each CELP parameter ignorance CELP parameter of direct mapping Code conversion. This method only is applied to require extremely limited condition between source and destination CELP codec Particular case in. For example, it requires Algebraic CELP (ACELP) and compiles in source and destination Decoder identical subframe size among both. For the code conversion of great majority based on CELP, it does not produce The voice of excellent quality. This method one of only is suitable in the GSM-AMR pattern, does not comprise All patterns among the GSM-AMR.
Be discussed in detail below a kind of method and apparatus of the present invention. In the following description, in order to illustrate Purpose is stated many specific details, in order to thorough understanding of the present invention is provided. For the order that illustrates And use GSM-AMR and situation G.723.1 for the purpose of giving an example. Method described herein General, and be applied to the CELP codec any between code conversion. Be familiar with this skill The relevant personnel in art field will appreciate that, can use other step, configuration and arrangement and do not depart from this The spirit and scope of invention.
The present invention includes algorithm and method, be used for carrying out based on the intelligence between the speech coding standard of CELP Code conversion. The present invention also comprises the code conversion in the single standard, so that the control of execution speed (by Code conversion is to introducing silent frames than low mode or by the speech activity detector that embeds). Lower part Details of the present invention is discussed.
The present invention is the run time version conversion on the basis of a subframe of a subframe. That is, when the code converting system When receiving a frame, transcoder can begin the operation on its subframe, and produces output Subframe. In case produced the subframe of sufficient amount, just can produce a frame. If by the source and The duration of the frame of purpose standard definition is identical, and then an incoming frame will produce an output frame, and will be no Then will need to cushion each incoming frame or produce a plurality of output frames. If subframe has the different durations, Then need to be between the subframe parameter interpolation. Therefore the code conversion operation comprises four kinds of operations: (1) bit Diffluence is except encapsulation, the interpolation of (2) sub-Frame buffering and source CELP parameter, (3) mapping be tuned to purpose The CELP parameter, and (4) encapsulation code is to produce output frame. (seeing Fig. 8).
Figure 10 is block diagram, illustrates according to the codec code conversion equipment based on CELP of the present invention Principle. This square frame comprises source bit stream removal package module, intelligent interpolation engine, Parameter Mapping and accent Humorous module, optional advanced features module, control module and purpose bit stream package module.
Parameter Mapping and tuner module comprise mapping and tuning tactful handover module and Parameter Mapping and tuning Policy module.
By control module management code conversion operations.
When receiving a frame, the encapsulation that transcoder is removed bit stream is included in every in the frame with generation The CELP parameter of individual subframe. Interested parameter is that LPC coefficient, excitation are (from self adaptation and fixed code Word produces) and pitch lag.
Note only need to decoding to excitation, rather than whole speech waveforms is synthetic. This has reduced widely Source codec bit stream is removed the complexity of encapsulation. For CELP parameter straight space mapping (DSM) The code conversion strategy, interested also have code book to gain and fixed codeword. If need the subframe interpolation, Then finish at this moment.
Now subframe is in a kind of form, this form can admit by the purpose Parameter Mapping shown in Figure 14 and The processing of tuner module. Be independent of excitation CELP parameter and shine upon short-term LPC filter coefficient. Can The simple linear mapping of use in LSP puppet-frequency space is to produce the LSP that is used for the purpose codec Coefficient. Can also use more complicated non--linear interpolation. Can be by the cost of computational complexity correspondingly The many methods that provide better quality output are shone upon excitation CELP parameter. In presents, described Three kinds of mapping policys so, and be a part (Figure 10, the side of Parameter Mapping and tuning policy module Frame (4)):
CELP parameter straight space mapping (DSM);
Analyze the excitation space territory;
Analysis is through the excitation space territory of filtering
The selection of mapping and tuning strategy is by mapping and tuning tactful handover module (Figure 10, square frame (3)).
Be discussed in detail this three kinds of methods in the part below. Because these three kinds of methods in order to reduce computation burden To quality tradeoffs, so in the situation that equipment transships owing to a large amount of channels simultaneously, can use it , so that degrading of appropriateness to be provided aspect the quality. Therefore the performance of code converter can be adapted to available Resource. On the other hand, can construct generation with a kind of strategy that only produces the quality and performance that requires The code converting system. In this case, will be not in conjunction with mapping and tuning tactful handover module (Figure 10, side Frame (3)).
If can be applicable to the purpose standard, then can also use this moment speech activity detector (in ginseng Operate in the number space) reduce the bandwidth of output.
The output of Parameter Mapping and tuner module is purpose CELP codec code. They are separated according to compiling Code device CELP frame format is encapsulated in the purpose bit-stream frames. Need encapsulation process, so that the output bit is put In the understandable form of purpose CELP decoder. If using is in order to store, then can to encapsulate order The CELP parameter maybe can be by using the specific format storage. If transmit frame according to multi-media protocol, then also Can change encapsulation process, for example, in encapsulation process, implement to compare bit scrambling.
In addition, equipment of the present invention provides the function of interpolation optional signals processing capacity in future or module.
The subframe interpolation
When the subframe of various criterion represents that different time maybe when using different sampling rate, may need the subframe interpolation during extended period in the signal domain.For example, G.723.1 use the frame (7.5 milliseconds of every subframes) of 30 milliseconds of extended periods, and GSM-AMR uses the frame (5 milliseconds of every subframes) of 20 milliseconds of extended periods.This illustrates to imagery in Fig. 9.On two kinds of dissimilar parameters, carry out the subframe interpolation: the parameter (such as excitation and code word vector) of a sampling of (1) sampling, and (2) subframe parameter (such as LSP coefficient and pitch lag estimated value).Shine upon them by the discrete time index of parameter of considering a sampling of a sampling and the correct position that copies in the target-subframe.If use different sampling rates by different CELP standards, then may need to take a sample up or down.Come interpolation subframe parameter by some interpolation functions, in target-subframe, to produce the smooth estimated value of parameter.The intelligence interpolation algorithm can improve the speech code conversion, is not aspect calculated performance, and the more important thing is aspect speech quality.Simple interpolation functions is a linear interpolation.
As an example, Fig. 9 illustrates needs three GSM-AMR frames to describe two the identical voice signal extended periods that just can describe of frame G.723.1.Equally, for per two G.723.1 subframe need three GSM-AMR subframes.As mentioned above, there are two class parameters: the parameter (for example, self-adaptation and fixed codeword) of a full subframe parameter (for example, LSP coefficient) and a sampling of a sampling.Come conversion table linearly to be shown the subframe parameter of θ by the weighted sum of calculating overlapping subframe, and by copy suitable sampling form be expressed as v[] the parameter of a sampling of a sampling.For from subframe G.723.1 to the interpolation of GSM-AMR subframe, illustrate that to analyze formula as follows:
θ i gsm = θ [ 2 i / 3 ] g . 723.1 i?mod?3=0,2
θ i gsm = 1 2 ( θ [ 2 i / 3 ] g . 723.1 + θ [ 2 i / 3 ] g . 723.1 ) ?i?mod?3=1
v i gsm [ n ] = v [ ( 40 i + n ) / 60 ] g . 723.1 [ ( 40 i + n ) mod 60 ] ∀ i , n
Wherein i=0 is first subframe of a GSM-AMR frame, and i=4 is first subframe of the 2nd GSM-AMR frame, or the like.Figure 12 describes this process.
Should be being inserted in puppet-frequency domain in the LSP parameter (they are full subframe parameters), i.e. f=cos -1(q).This causes the output of better quality.Before interpolation, do not need other subframe parameter of conversion.
Notice that above-mentioned analysis formula obtains from simple linear interpolation.Any suitable interpolation scheme (such as teeth groove (spline), sinusoidal, or the like) can substitute this formula.In addition, each CELP parameter (LSP coefficient, hysteresis, pitch gain, code word gain and or the like) can use different interpolation schemes to obtain optimal perceptual quality.
LSP parameter maps and excitation vectors by the LSP coefficient are proofreaied and correct
Though nearly all audio codec based on CELP all uses identical method to obtain the LPC coefficient, also has some less important differences.These differences are owing to different windows size and the Different L PC interpolation of shape, each subframe, different subframe size, different LPC quantization scheme and different look-up tables cause.
In order further to improve the quality of the Audiocode conversion that produces by above-mentioned subframe interpolating method, by using the excitation vectors that is used as the echo signal in the code conversion from the LPC adjustment of data of source and destination codec.
Can use following two kinds of methods to improve perceived quality.
The linear transformation of method 1:LSP coefficient
The conventional method of changing between the LSP coefficient is through linear transformation,
q′=Λq+b
Wherein q ' is a purpose LSP vector (in puppet-frequency domain), and q is source (original) LSP vector, and A is the matrix of a linear transformation, and b is a bias term.In the simplest situation, A reduces to identity matrix (identitymatrix), and b reduces to zero.For the embodiment that G.723.1 arrives the GSM-AMR transcoder, the DC bias term of using in the GSM-AMR codec is different with a DC bias term of G.723.1 codec use, uses the b item in the above-mentioned formula to compensate this difference.
Method 2: the excitation vectors by the LSP coefficient is proofreaied and correct
In each subframe by the synthetic source forcing vector through decoding of source LSP coefficient to be transformed into voice domain, then, the LP parameter through quantizing of application target codec is carried out filtering, to form the echo signal in the code conversion.This correction is chosen wantonly, and when there were significant differences in the LSP parameter, it can improve perceptual speech quality widely.Figure 13 describes to encourage bearing calibration.
Parameter maps and tuner module
Three kinds of strategies of mapping CELP excitation parameters are discussed in this part.Ordering by continuous computational complexity and output quality is represented them.Core of the present invention is such fact, that is, can directly shine upon excitation and need not to construct voice signal again.This means because signal does not need to resemble the conventional art requirement filtering by short-term impulse response, so during the closed loop codebook search, saved a large amount of calculating.This mappings work is because incoming bit stream has comprised the optimal excitation according to the source CELP codec that produces voice.The present invention uses this fact to carry out to replace the quick search in the excitation domain of voice domain.
As mentioned above, have three kinds of methods of each excitation that all has preferable successively performance mapping, allow transcoder to be adapted to available computational resource.
The mapping of CELP parameter straight space
This strategy is the simplest code conversion scheme.Mapping is based on the similarity of the physical significance between the source and destination parameter, and the direct run time version conversion of operational analysis formula and need not any iteration or search.The advantage of this scheme is that it does not need a large amount of storeies, and consumes almost nil MIPS, but it still can produce the sound of intelligence, even quality decreases.Notice that CELP parameter straight space mapping method of the present invention is different with the equipment of the prior art shown in Fig. 7.This method is general, and aspect different frame or subframe size, it is applied to all types of code conversions based on CELP.
Analysis in the excitation space territory
This strategy is to search for self-adaptation and fixed password these both than the more advanced part of previous strategy, and the gain of estimating by common mode by purpose CELP standard definition, unless define them in excitation domain rather than in voice domain.At first use from the tone of input CELP subframe and determine that by Local Search tonal content (pitch contribution) is as initial estimate.In case find, just deduct tonal content, and assign to determine fixed password originally by optimally mating remainder from excitation.The advantage of these Cascading Methods do not need to be the automatic correlation technique from the CELP standard is used to calculate open loop tone estimated value, but as an alternative, can determine from the pitch lag of the CELP subframe through decoding.Also be in excitation domain, rather than in the voice domain, execution is searched for, so that do not need the impulse response filter during tone and the codebook search.This has saved a large amount of calculating and not compromise output quality.
In the analysis in the excitation space territory of filtering
In this case, still the LP parameter is mapped directly to the purpose codec from the source codec, and the pitch lag of use through decoding is as the open loop tone estimated value of purpose codec.Still in excitation domain, carry out the search of closed loop tone.Yet, carry out this search of fixed password in excitation space territory through filtering.The selection of filter type, and whether the target vector of one or two search is transformed into this territory, depend on desired quality and complicacy requirement.
Various wave filters be can use, a low-pass filter of filtering scrambling (smooth irregularities), a wave filter that compensates the difference between the incentive characteristic in the source and destination codec and a wave filter that strengthens appreciable signal of interest feature comprised.Advantage is, uses the composite filter through the LP of weighting the echo signal in standard code is calculated, and the parameter of this wave filter (exponent number (order), frequency increase the weight of/remove to increase the weight of, phase place) all is tunable.Therefore, this strategy allow tuning and improve specific codec between the code conversion quality, and the quality tradeoffs that guarantees to reduce complicacy.
Silent frames code conversion and generation
Some is based on the standard implementation speech activity detector (VAD) of CELP, and it allows discontinuous transmission (DTX) and comfort noise between no speech period to produce (CNG).In using VAD, there is important bit rate advantage.Need the code conversion between these frames, and do not produce in the situation of silent frames, for the purpose codec produces silent frames at the source codec.Frame generally includes some parameters, is used at the suitable comfort noise of demoder place generation.Can use simple algebraic method that these parameters are carried out code conversion.
The embodiments of the invention example
Lower part show for G.723.1 with the embodiments of the invention of GSM-AMR voice coding standard.The invention is not restricted to these standards.It comprises all audio coding standard based on CELP.Be familiar with those skilled in the art person and will appreciate that how to use these methods to carry out other based on the code conversion between the coding standard of CELP.Before describing preferred embodiment, at first provide GSM-AMR and the G.723.1 simple declaration of codec.
The GSM-AMR codec
It is eight source codecs of 12.2,10.2,7.95,7.40,6.70,5.90,5.15 and 4.75 kilobits/second that the GSM-AMR codec uses bit rate.
Codec is based on Code Excited Linear Prediction (CELP) encoding model.Use the 10th rank linear prediction (LP), or short-term, composite filter.It is long-term to use so-called self-adaptation code book method to implement, or tone, composite filter.
In CELP phonetic synthesis model, by adding the pumping signal that constitutes short-term LP composite filter input from two excitation vectors of self-adaptation and fixing (innovation) code book.Come synthetic speech by presenting by two vectors correctly selecting the code book of short-term composite filter from these.Use by analyzing the search procedure of synthesize (in this process, according to appreciable weighted distortion measurement, the error minimum between the original and synthetic speech) and select the optimal excitation sequence in the code book.The perceptual weighting filter that uses in the search technique synthetic by analysis uses non-quantized LP parameter.
Codec is operated on the speech frame of 20 milliseconds (corresponding to 160 samplings by the sampling frequencies of 8000 sampling/seconds).Each place at 160 phonetic samplings analyzes voice signal, with the parameter (LP filter coefficient, self-adaptation and this index of fixed password and gain) of obtaining the CELP model.These parameters are encoded and sent.At the demoder place, these parameters are decoded, and come synthetic speech by the reconstituted pumping signal of LP composite filter filtering.
For 12.2 kilobits/second patterns, every frame is carried out twice LP and is analyzed, and for other pattern, carries out once.For 12.2 kilobits/second patterns, become two groups of LP Parameters Transformation the line frequency spectrum to (LSP), and use division matrix quantization (SMQ) to quantize together with 38 bits.For other pattern, single LP parameter group is converted to the line frequency spectrum to (LSP), and use division vector quantization (SVQ) to quantize.
Speech frame is divided into four subframes that each is 5 milliseconds (40 samplings).Each subframe sends self-adaptation and this parameter of fixed password.According to subframe use through quantize with non-quantized LP parameter or their interpolation form.According to the weighted speech signal of perception, estimate the open loop pitch lag every a subframe (except 5.15 and 4.75 kilobits/second patterns, the every frame of this two-mode carries out once).
Then, repeat following operation for each subframe:
Assign to calculate echo signal by weighted synthesis filter filtering LP remainder, wherein upgraded the original state (this and deduct the commonsense method equivalence of the zero input response of weighted synthesis filter from voice signal) of wave filter through weighting by the error between filtering LP remainder and the excitation.
Calculate the impulse response of weighted synthesis filter.
Then,, use target and impulse response, carry out closed loop tone analysis (seeking pitch lag and gain) by search open loop pitch lag.The use sampling resolution is 1/6 or 1/3 mark tone (according to pattern).
Upgrade echo signal by removing self-adaptation code book component (filtering adaptive code vector), and fixedly using this new target (seeking optimum innovation code word) in the algebraically codebook search.
This gain of self-adaptation and fixed password is a scalar of using 4 and 5 bit quantizations respectively, or with the vector (having the moving average (MA) that puts on this gain of fixed password predicts) of 6-7 bit quantizations.
At last, upgrade filter memory (using the pumping signal of determining) in order to seek the echo signal in the next subframe.
In each speech frame of 20 milliseconds, produce the Bit Allocation in Discrete of 95,103,118,134,148,159,204 or 244 bits, corresponding to the bit rate of 4.75,5.15,5.90,6.70,7.40,7.95,10.2 and 12.2 kilobits/second.
G.723.1 codec
G.723.1 codec has two bit rates associated therewith, that is, and and 5.3 and 6.3kbps.Two speed are the mandatory parts of encoder.Might on any 30 milliseconds of frame boundaries, between two speed, switch.
Codec is based on by the linear prediction analysis principle of composite coding, and attempts to make the weighted error signal minimum of perception.Scrambler is the upward operation of piece (frame) of 240 samplings at each.When the 8KHz sampling rate, this equals 30 milliseconds.Each piece at first carries out high-pass filtering, to remove the DC component, then, is divided into four subframes that each is 60 samplings.For each subframe, use untreated input signal to calculate the 10th rank Linear Predictive Coder (LPC) wave filter.Use prediction division vector quantizer (PSVQ) to quantize the LP wave filter of last subframe.Use non-quantized LPC coefficient to construct the short-term perception weighting filter, use this wave filter that entire frame is carried out filtering, and obtain the perceptual weighting voice signal.
For per two subframes (120 samplings), use the voice signal of weighting to calculate open loop pitch period L OLCarrying out this tone on the piece of 120 samplings estimates.In the scope of from 18 to 142 samplings, search for pitch period.
From this moment, processed voice on the basis of 60 samplings of every subframe.
The pitch period that calculates before using through estimating, structure harmonic noise forming filter.Use the combination of LPC composite filter, resonance peak perceptual weighting filter and harmonic noise forming filter, to create impulse response.Then, use impulse response further to calculate.
Use pitch period estimation value L OLAnd closed loop tone predicted value is calculated in impulse response.Use the 5th rank tone predicted value.Calculate pitch period as a little difference around open loop tone estimated value.From the initial target vector, deduct tone predicted value component then.Pitch period and difference both are sent to demoder.
At last, the aperiodic component of approximate excitation.For high bit rate, and the quantification of use multiple-pulse maximum likelihood ratio (MP-MLQ) excitation, and, use the algebraically code book to encourage for low bit rate.
First embodiment-GSM-AMR is to 6.723.1
Figure 17 is the block scheme according to the first embodiment of the present invention, illustrates from GSM-AMR to G.723.1 transcoder.The GSM-AMR bit stream comprises 95 bits (12 byte) of length from 244 bits (31 byte) of flank speed pattern 12.2kbps to minimum speed limit pattern 4.75kbps codec.Always have eight patterns.In eight GSM-AMR operator schemes each produces different bit streams.Because the G.723.1 frame of 30 milliseconds of extended periods comprises one and half GSM-AMR frame, so need two GSM-AMR frames to produce single G.723.1 frame.Can when arriving, the 3rd GSM-AMR frame produce G.723.1 frame of the next one then.So three GSM-AMR frames of every processing produce two G.723.1 frames.
The 10LSP parameter of using identical technology that the short-term filter in the GSM-AMR model for speech production is used is encoded, but presses different bitstream formats for different operator schemes.In the GSM-AMR normative document, provide the algorithm of constructing the LSP parameter again
In case produced the short-term filter parameter of each subframe, just needed to form excitation vectors by combination self-adaptation code word and fixing (algebraically) code word.According to 1/6 or 1/3 resolution pitch lag parameter, use 60-tap (tap) interpolation filter to construct the self-adaptation code word.Construct fixed codeword then, define as excitation by standard and formation:
x [ n ] = g ~ p v [ n ] + g ~ c c [ n ]
Wherein x is excitation, and v is the self-adaptation code word through interpolation, and c is the fixed code vector, and With
Figure C03805519D00243
It is respectively the gain of self-adaptation and fixed code.Use this to encourage then and upgrade the memory state that GSM-AMR removes wrapper, and shine upon by bit stream wrapper G.723.1.
Seek the self-adaptation code word of each subframe by the linear combination that forms excitation vectors, and seek remove the Optimum Matching of the target excitation signal x{} of wrapper structure by GSM-AMR.Combination is the weighted sum of five former excitations that lag behind continuously.This can illustrate best by formula:
v [ n ] = Σ j = - 2 2 β j u [ n - L + j ] , 0 ≤ n ≤ 59
V[wherein] be the self-adaptation code word of constructing again, u[] be former excitation impact damper, L is (integer) pitch lag (removing package module from GSM-AMR determines) that comprises between 18 and 143, and β jBe the hysteresis weighted value, it determines gain and lagging phase.Search β jVector table, make self-adaptation code word v[] and excitation vectors x[] between the coupling optimization.
In case find the adaptive code word component of excitation, just deduct this component from excitation, stay remainder and prepare by this coding of fixed password.The residual signal that calculates each subframe is,
x 2[n]=x[n]-v[n],n=0,…,59
X wherein 2[] is the target of this search of fixed password, x[] be to remove the excitation that encapsulation is derived from GSM-AMR, and v[] be (through interpolation with through calibrating) self-adaptation code word.
For the G.723.1 height and the low rate mode of codec, fixed password originally is different.Two-forty is used MP-MLQ code book, and it allows in any position, six pulses of the every subframe of even number subframe, and five pulses of the every subframe of odd number subframe.Low rate mode is used algebraically code book (ACELP), and it allows four pulses of every subframe in restricted position.Two kinds of code books are all used the grid sign to represent whether should be offset code word and are made it to move a position.Except owing to be to carry out search rather than carry out search in voice domain in excitation domain, do not use outside the impulse response filter, search for these code books by the method that in standard, defines.
When the processing of finishing each subframe, need upgrade (lasting) storer of codec.This so finishes: at first make former excitation impact damper u[] displacement 60 samplings (that is, a subframe), so that abandoned the oldest sampling, then encouraging 60 samplings that copy the impact damper top from current subframe to,
u [ n ] = u [ n + 60 ] , - 85 &le; n < 0 g ~ p v [ n ] + g ~ c c [ n ] , 0 &le; n &le; 59
Wherein first sampling with respect to current subframe is provided with index n, and the former definition of other parameter.
All parameters through mapping all are encoded to export G.723.1 in the bit stream, next frame is prepared to handle by system.
Second embodiment: 6.723.1 is to GSM-AMR
Figure 18 is a block scheme according to a second embodiment of the present invention, and the transcoder that G.723.1 arrives GSM-AMR is described.G.723.1 bit stream comprises the frame of length 192 bits (24 byte) of two-forty (6.3kbps) codec, or the frame of 160 bits (20 byte) of low rate (5.3kbps) codec.These frames have the structure of fairly similar, and difference only is the expression of this parameter of fixed password.
For high and low rate, by identical mode the 10LSP parameter that is used to form short-term voice range filter model is encoded, and can obtain to 25 from the bit 2 of frame G.723.1.Only the LSP to the 4th subframe encodes, and uses the interpolation between the frame, to produce the LSP of other three subframes again.Coding uses three look-up tables, and constructs the LSP vector again by the combination of three sub-vectors obtaining from these forms.Each form has 256 vector inputs, and two forms in front have 3-unit sub-vector, and last form has 4-unit sub-vector.Make up these and provide 10-unit LSP vector.
Construct the self-adaptation code word of each subframe by making up former excitation vectors.Combination is the weighted sum of the former excitation of five continuous hysteresis place.Can this be described preferably by formula,
v [ n ] = &Sigma; j = - 2 2 &beta; j u [ n - L + j ] , 0 &le; n &le; 59
V[wherein] be the self-adaptation code word of constructing again, u[] be former excitation impact damper, L is (integer) pitch lag that comprises between 18 and 143, and β jIt is the hysteresis weighted value of determining by the pitch gain parameter.
Directly obtain lag parameter L from bit stream.Whole dynamic ranges that the first and the 3rd subframe use to lag behind, and the second and the 4th subframe to lag behind coding as from before the skew of subframe.Search to determine hysteresis weighting parameters β by form jRemove the result of encapsulation as the self-adaptation code word, can be by calculating the approximate value of the gain of determining the mark pitch lag and being associated.
L i - &Sigma; j = - 2 2 j&beta; i , j 2 &Sigma; j = - 2 2 &beta; i , j 2
For the G.723.1 height and the low rate mode of codec, fixed password originally is different.High-rate mode is used the MP-MLQ code book, and it allows in any position, six pulses of the every subframe of even number subframe, and five pulses of the every subframe of odd number subframe.Low rate mode is used algebraically code book (ACELP), and it allows four pulses of every subframe in restricted position.Two kinds of code books are all used the grid sign to represent whether should be offset code word and are made it to move a position.G.723.1 providing the algorithm that produces code word from encoded bit stream in the normative document.
When the processing of finishing each subframe, need upgrade (lasting) storer of codec.This so finishes: at first make former excitation impact damper u[] displacement 60 samplings (that is, a subframe), so that abandoned the oldest sampling, then encouraging 60 samplings that copy the impact damper top from current subframe to,
u [ n ] = u [ n + 60 ] , - 85 &le; n < 0 g ~ p v [ n ] + g ~ c c [ n ] , 0 &le; n &le; 59
Wherein first sampling with respect to current subframe is provided with index n, and the former definition of other parameter.
The GSM-AMR parameter maps of transcoder partly obtains aforesaid through the CELP of interpolation parameter, and uses their bases as search GSM-AMR parameter space.When receiving, the LSP parameter is encoded simply, and use other parameter, that is, excitation and pitch lag are as the estimated value of sound search in the GSM-AMR space.Below describe (figure) the main operation that must occur in for completion code conversion on each subframe is shown.
For with the optimum matching of target excitation, the former excitation vectors that reaches maximum 143 hysteresis by search forms the self-adaptation code word.Determine target excitation from subframe through interpolation.Can come interpolation excitation in the past at interval by 1/6 or 1/3 according to pattern.Seek optimum the hysteresis by search about a zonule of pitch lag (determining) from G.723.1 removing package module.Search for this zone and lag behind, and then seek and definite fractional part that lags behind to seek optimum integer.This process is used 24-tap interpolation filter, to carry out the mark search.First is different with the processing of the second and the 4th subframe with the processing of the 3rd subframe.Then, form self-adaptation code word v[through interpolation] be,
v [ n ] = &Sigma; i = 0 9 u [ n - L - i ] b 60 [ t + 6 i ] + u [ n - L + 1 + i ] b 60 [ 6 - t + 6 i ]
V[wherein] be former excitation impact damper, L is (integer) pitch lag, t is the mark pitch lag by 1/6 resolution, and b 60It is 60-tap interpolation filter.
Calculate and quantize pitch gain, so that can encode and send to demoder, and be used to calculate this target vector of fixed password it.All patterns are all pressed same way as each subframe are calculated pitch gain,
g p = x T v v T v
G wherein pBe non-quantized pitch gain, x is the target of self-adaptation codebook search, and v is (through interpolation) self-adaptation code word vector.12.2kbps quantize self-adaptation and this gain of fixed password independently with the 7.95kbps pattern, and other pattern is used the quantification of uniting of fixing and adaptive gain.
In case find the self-adaptation code book component of excitation, just deduct this component from excitation, stay remainder and prepare to be used for by fixed password coding originally.The residual signal that calculates each subframe is,
x 2 [ n ] = x [ n ] - g ~ p v [ n ] , n = 0 , &CenterDot; &CenterDot; &CenterDot; , 39
X wherein 2[] is the target of this search of fixed password, x[] be the target of self-adaptation codebook search, g^ pBe pitch gain, and v[through quantizing] be (through interpolation) self-adaptation.
The designs fix codebook search is to seek the optimum matching for residual signal after removing self-adaptation code book component.This is very important for non-voice voice and for starting the self-adaptation code book.Owing to the analysis of a large amount of raw tones has taken place, so the codebook search that uses can be simpler than the codebook search that uses in codec in code conversion.Also have, the signal of carrying out codebook search thereon is the pumping signal through constructing again that replaces synthetic speech, has therefore had a kind of structure that more can admit this coding of fixed password.
According to the energy of former four subframes, use the moving average value prediction to quantize this gain of fixed password.Correction factor between reality and the prediction gain is quantized (by searching form), and send to demoder.In the GSM-AMR normative document, provide definite details.
When the processing of finishing each subframe, need to upgrade (lasting) storer that is used for codec.This so carries out: at first make former excitation impact damper u[] displacement 40 samplings (that is, a subframe), consequently abandon the oldest sampling, from current subframe excitation is copied to 40 samplings in top of impact damper then,
u [ n ] = u [ n + 40 ] , - 114 &le; n < 0 g ~ p v [ n ] + g ~ c c [ n ] , 0 &le; n &le; 39
Wherein first sampling with respect to current subframe is provided with index n, and other parameter all defined in the past.
When illustrating and describing the embodiment of the current conduct example of thinking of the present invention, those skilled in the art that will appreciate that, can carry out various other modifications, and can substitute, and not depart from true scope of the present invention with equivalent.In addition, can make many modifications by theory of the present invention adapts to specific situation and does not depart from invention thought in center described herein.

Claims (25)

1. equipment, be used for from a kind of standard based on CELP to another kind based on the standard of CELP or in single standard but different patterns is carried out the conversion of CELP frame, comprising:
Bit stream is removed package module, is used for obtaining one or more sources CELP parameter with the form of source codec from the source bit stream;
Be coupled to described bit stream and remove the interpolator module of package module, this interpolator module is applicable to the difference between source frame sign and purpose frame sign, difference between source subframe size and the purpose subframe size, and during one or more existence the in the difference between source sampling speed and the purpose sampling rate, in described one or more sources CELP parameter with one or morely carry out interpolation between the CELP of interpolation parameter, wherein said one or more sources CELP parameter comprises pitch lag, pitch gain, the LSP coefficient, one or more in code book gain and the excitation vectors;
Be coupled to the mapping block of described interpolator module, this mapping block be applicable to described one or more sources CELP parameter or described one or more through the CELP of interpolation parameter maps to one or more purpose CELP parameter codecs;
Be coupled to the purpose bit stream package module of described mapping block, this purpose bit stream package module is applicable to the form of purpose codec and uses described one or more purpose CELP parameter to constitute at least one purpose CELP frame; And
Be coupled to described purpose bit stream package module, mapping block, interpolator module and bit stream and remove at least one controller in the package module, this controller is applicable to the operation of the one or more modules of management, and be applicable to the instruction of reception from one or more external applications, this controller is applicable to status information is offered described one or more external application.
2. equipment as claimed in claim 1 is characterized in that, described bit stream is removed package module and comprised:
Bit-stream processor, described bit-stream processor are applicable to that first form by one or more CELP parameters obtains information in the CELP codec incoming frame of source;
Be coupled to the LSP decoder module of described bit-stream processor, described LSP decoder module is applicable to that use exports one or more LSP coefficients from the information of described source CELP codec incoming frame at least;
Be coupled to the decoder module of described bit-stream processor, described decoder module is applicable to decodes to export pitch lag parameter and pitch gain parameter from described source CELP codec incoming frame to described information;
Be coupled to this decoder module of fixed password of described bit-stream processor, described this decoder module of fixed password is applicable to decodes with this vector of output fixed password to described information;
Be coupled to the self-adaptation code word decoder module of described bit-stream processor, described self-adaptation code word decoder module is applicable to decodes with output adaptive code book component vector to described information; And
Be coupled to the actuation generator of described this decoder module of fixed password, described actuation generator is applicable to and uses described this vector of fixed password and self-adaptation code book vector to come the output drive vector at least.
3. equipment as claimed in claim 1 is characterized in that, described interpolator module comprises:
The LSP module, when described LSP module is applicable to difference between difference, source subframe size and the purpose subframe size between source frame sign and purpose frame sign and one or more existence the in the difference between source sampling speed and the purpose sampling rate, one or more LSP coefficients of source codec are converted to one or more LSP coefficients of purpose codec;
Self-adaptation code book module, when described self-adaptation code book module is applicable to difference between difference, source subframe size and the purpose subframe size between source frame sign and purpose frame sign and one or more existence the in the difference between source sampling speed and the purpose sampling rate, the pitch lag and the pitch gain that will convert described purpose codec to from the pitch lag and the pitch gain of described source codec; And
The CELP parameter buffer, when described CELP parameter buffer is applicable to difference between difference, source subframe size and the purpose subframe size between source frame sign and purpose frame sign and one or more existence the in the difference between source sampling speed and the purpose sampling rate, save as interpolation and one or more sources CELP parameter that need be cushioned.
4. equipment as claimed in claim 1 is characterized in that, described mapping block comprises:
Strategy handover module, described tactful handover module are applicable to selects CELP parameter maps strategy; And
Parameter maps and tuner module, described mapping and tuner module are applicable to the described one or more purpose CELP parameters of output.
5. equipment as claimed in claim 4 is characterized in that, described CELP parameter maps strategy comprises in the following:
CELP parameter straight space mapping program;
At routine analyzer in the excitation space territory of filtering; And
Routine analyzer in the excitation space territory.
6. equipment as claimed in claim 4 is characterized in that, described parameter maps and tuner module comprise:
LSP coefficient converter, it is encoded to purpose LSP coefficient; And
CELP encourages map unit, and it obtains the CELP excitation parameters that comprises pitch lag, pitch gain and excitation vectors from interpolation, to obtain encoded CELP excitation parameters.
7. equipment as claimed in claim 6 is characterized in that, described CELP excitation map unit comprises:
The module of CELP parameter straight space mapping, it uses the analysis formula that need not any iteration to produce one or more encoded purpose CELP parameters;
Analysis module in the mapping of excitation space territory, it produces one or more encoded purpose CELP parameters by search excitation space territory; And
At the analysis module in the mapping of the excitation space territory of filtering, it is by self-adapting closed loop and fixed password through the excitation space of filtering in the original generation one or more encoded purpose CELP parameters of search in the excitation space territory.
8. equipment as claimed in claim 1, it is characterized in that, described purpose bit stream package module comprises a plurality of frame sealed in units, in the described frame sealed in unit each can both be applicable to an application of selecting in advance from a plurality of application that are used for selecting the purpose celp coder, and selected purpose celp coder is one that comprises in a plurality of celp coders of purpose celp coder.
9. equipment as claimed in claim 1 is characterized in that described interpose module is configured to carry out linear operation.
10. equipment as claimed in claim 3 is characterized in that, described CELP parameter buffer comprises:
Excitation vectors impact damper, described excitation vectors impact damper are applicable to storage wait excitation vectors that shine upon, that construct again in next subframe or frame;
LSP coefficient impact damper, its storage are waited for LSP coefficient that shine upon, before or after the interpolation in next subframe or frame; And
Other parameter buffer of CELP, its storage are waited for pitch lag, pitch gain, code book gain and index that shine upon, before or after the interpolation in next subframe or frame.
11. a method is used for carry out the code conversion from the source codec to the purpose codec based on the compressed voice bitstream of CELP, described method comprises:
Handle input CELP bit stream so that at least one or multiple source CELP parameter are removed encapsulation;
During one or more existence in the difference between difference between the difference between source frame sign and purpose frame sign, source subframe size and the purpose subframe size and source sampling speed and the purpose sampling rate, to one or more through the CELP of interpolation parameter, wherein said one or more sources CELP parameter comprises one or more in pitch lag, pitch gain, LSP coefficient, code book gain and the excitation vectors with described one or more sources CELP parameter interpolate;
With described one or more sources CELP parameter or described one or more through one or more purpose CELP parameters of the CELP of interpolation parameter maps to described purpose codec; And
The processing intent CELP bit stream by the one or more purpose CELP parameters that encapsulate described purpose codec at least.
12. method as claimed in claim 11 is characterized in that, described interpolation comprises:
The one or more purpose LSP coefficients that are inserted into the purpose codec in the one or more sources LSP coefficient from the source codec;
Other purpose CELP parameter that arrives the purpose codec from other source CELP parameter interpolate of being different from of source codec described one or more sources LSP coefficient; And
If excitation vectors does not need to proofread and correct, then this excitation vectors is sent to mapping process.
13. method as claimed in claim 12 further comprises:
Use the linear transformation process one or more LSP coefficients of source codec format to be converted to one or more LSP coefficients of purpose codec format.
14. method as claimed in claim 12 further comprises:
By using at least one or multiple source LPC coefficient to convert one or more excitation vectors of source codec to the synthetic speech vector;
With one or more purpose LPC coefficient quantizations is purpose LPC coefficient after one or more quantifications;
By using the purpose LPC coefficient after at least one or a plurality of quantification to convert calibrated excitation vectors to through synthetic speech vector; And
Calibrated excitation vectors is sent to another process.
15. method as claimed in claim 11 is characterized in that, also comprises:
Select the CELP mapping policy, described CELP mapping policy is selected from the following;
The straight space mapping program;
Routine analyzer in the excitation space territory; And
At routine analyzer in the excitation space territory of filtering; And
Use described CELP mapping policy to carry out mapping process, with will be from one or more CELP parameter maps of source codec format one or more CELP parameters to the purpose codec format.
16. method as claimed in claim 15 is characterized in that, described selection is not to be only limited to above-mentioned three kinds of strategies, and the combination that can select three kinds of strategies is as new mapping policy.
17. equipment as claimed in claim 1 is characterized in that, also comprises the silent frames code conversion unit that is used to carry out the function that is selected from the grouping of being made up of generation of mitigation noise and discontinuous transmission.
18. equipment as claimed in claim 1 is characterized in that, also comprises being used to the module carrying out voice activity detection and produce silent frames.
19. equipment as claimed in claim 1 is characterized in that, described controller is used to carry out the channel density control gear that is applicable to available computational resource and allows to have the quality reduction of appropriateness under load.
20. method as claimed in claim 11 is characterized in that, described method can be performed and need not again the structure voice signal in the retrieval one or more CELP parameters.
21. method as claimed in claim 11 is characterized in that, also comprises input CELP bit stream is sent to described purpose CELP bit stream.
22. equipment as claimed in claim 1 is characterized in that, describedly one or morely comprises in pitch lag, pitch gain, LSP coefficient, code book gain and the excitation vectors two or more at least through the CELP of interpolation parameter.
23. equipment as claimed in claim 1 is characterized in that, described interpose module is configured to carry out nonlinear operation.
24. equipment as claimed in claim 1 is characterized in that, described interpose module is configured to carry out upwards sampling.
25. equipment as claimed in claim 1 is characterized in that, described interpose module is configured to carry out downsampled.
CNB038055198A 2002-01-08 2003-01-08 A transcoding scheme between CELP-based speech codes Expired - Fee Related CN100527225C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US34727002P 2002-01-08 2002-01-08
US60/347,270 2002-01-08

Publications (2)

Publication Number Publication Date
CN1701353A CN1701353A (en) 2005-11-23
CN100527225C true CN100527225C (en) 2009-08-12

Family

ID=23363030

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB038055198A Expired - Fee Related CN100527225C (en) 2002-01-08 2003-01-08 A transcoding scheme between CELP-based speech codes

Country Status (6)

Country Link
EP (1) EP1464047A4 (en)
JP (1) JP2005515486A (en)
KR (1) KR20040095205A (en)
CN (1) CN100527225C (en)
AU (1) AU2003207498A1 (en)
WO (1) WO2003058407A2 (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004064041A1 (en) * 2003-01-09 2004-07-29 Dilithium Networks Pty Limited Method and apparatus for improved quality voice transcoding
EP1618557B1 (en) 2003-05-01 2007-07-25 Nokia Corporation Method and device for gain quantization in variable bit rate wideband speech coding
FR2867648A1 (en) * 2003-12-10 2005-09-16 France Telecom TRANSCODING BETWEEN INDICES OF MULTI-IMPULSE DICTIONARIES USED IN COMPRESSION CODING OF DIGITAL SIGNALS
US20050258983A1 (en) * 2004-05-11 2005-11-24 Dilithium Holdings Pty Ltd. (An Australian Corporation) Method and apparatus for voice trans-rating in multi-rate voice coders for telecommunications
FR2871247B1 (en) 2004-06-04 2006-09-15 Essilor Int OPHTHALMIC LENS
US20070250308A1 (en) * 2004-08-31 2007-10-25 Koninklijke Philips Electronics, N.V. Method and device for transcoding
FR2880724A1 (en) 2005-01-11 2006-07-14 France Telecom OPTIMIZED CODING METHOD AND DEVICE BETWEEN TWO LONG-TERM PREDICTION MODELS
EP1955321A2 (en) * 2005-11-30 2008-08-13 TELEFONAKTIEBOLAGET LM ERICSSON (publ) Efficient speech stream conversion
JP4983606B2 (en) * 2005-12-21 2012-07-25 日本電気株式会社 Code conversion apparatus, code conversion method used therefor, and program therefor
US7826536B2 (en) * 2005-12-29 2010-11-02 Nokia Corporation Tune in time reduction
EP1903559A1 (en) * 2006-09-20 2008-03-26 Deutsche Thomson-Brandt Gmbh Method and device for transcoding audio signals
EP1933306A1 (en) * 2006-12-14 2008-06-18 Nokia Siemens Networks Gmbh & Co. Kg Method and apparatus for transcoding a speech signal from a first code excited linear prediction (CELP) format to a second code excited linear prediction (CELP) format
US8566106B2 (en) 2007-09-11 2013-10-22 Voiceage Corporation Method and device for fast algebraic codebook search in speech and audio coding
CN101459833B (en) * 2007-12-13 2011-05-11 安凯(广州)微电子技术有限公司 Transcoding method used for similar video code stream and transcoding device thereof
CN101572093B (en) * 2008-04-30 2012-04-25 北京工业大学 Method and device for transcoding
US8521520B2 (en) 2010-02-03 2013-08-27 General Electric Company Handoffs between different voice encoder systems
CN105359210B (en) 2013-06-21 2019-06-14 弗朗霍夫应用科学研究促进协会 MDCT frequency spectrum is declined to the device and method of white noise using preceding realization by FDNS
CN106165013B (en) 2014-04-17 2021-05-04 声代Evs有限公司 Method, apparatus and memory for use in a sound signal encoder and decoder
CN104167210A (en) * 2014-08-21 2014-11-26 华侨大学 Lightweight class multi-side conference sound mixing method and device
CN117476022A (en) * 2022-07-29 2024-01-30 荣耀终端有限公司 Voice coding and decoding method, and related device and system

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5457685A (en) * 1993-11-05 1995-10-10 The United States Of America As Represented By The Secretary Of The Air Force Multi-speaker conferencing over narrowband channels
JPH08146997A (en) * 1994-11-21 1996-06-07 Hitachi Ltd Device and system for code conversion
US5758256A (en) * 1995-06-07 1998-05-26 Hughes Electronics Corporation Method of transporting speech information in a wireless cellular system
US5995923A (en) 1997-06-26 1999-11-30 Nortel Networks Corporation Method and apparatus for improving the voice quality of tandemed vocoders
JP3235654B2 (en) * 1997-11-18 2001-12-04 日本電気株式会社 Wireless telephone equipment
US6260009B1 (en) * 1999-02-12 2001-07-10 Qualcomm Incorporated CELP-based to CELP-based vocoder packet translation
JP2002202799A (en) * 2000-10-30 2002-07-19 Fujitsu Ltd Voice code conversion apparatus
JP2002229599A (en) * 2001-02-02 2002-08-16 Nec Corp Device and method for converting voice code string
US20030028386A1 (en) * 2001-04-02 2003-02-06 Zinser Richard L. Compressed domain universal transcoder
KR100434275B1 (en) * 2001-07-23 2004-06-05 엘지전자 주식회사 Apparatus for converting packet and method for converting packet using the same

Also Published As

Publication number Publication date
JP2005515486A (en) 2005-05-26
EP1464047A4 (en) 2005-12-07
WO2003058407A2 (en) 2003-07-17
CN1701353A (en) 2005-11-23
WO2003058407A3 (en) 2003-12-24
EP1464047A2 (en) 2004-10-06
KR20040095205A (en) 2004-11-12
AU2003207498A1 (en) 2003-07-24
AU2003207498A8 (en) 2003-07-24

Similar Documents

Publication Publication Date Title
CN100527225C (en) A transcoding scheme between CELP-based speech codes
US6829579B2 (en) Transcoding method and system between CELP-based speech codes
CN103065637B (en) Audio encoder and decoder
US6260009B1 (en) CELP-based to CELP-based vocoder packet translation
KR100264863B1 (en) Method for speech coding based on a celp model
CN1735927B (en) Method and apparatus for improved quality voice transcoding
KR20070038041A (en) Method and apparatus for voice trans-rating in multi-rate voice coders for telecommunications
WO2001061687A1 (en) Wideband speech codec using different sampling rates
JPH10307599A (en) Waveform interpolating voice coding using spline
KR100603167B1 (en) Synthesis of speech from pitch prototype waveforms by time-synchronous waveform interpolation
JP2004526213A (en) Method and system for line spectral frequency vector quantization in speech codecs
JPH10187196A (en) Low bit rate pitch delay coder
CN103384900A (en) Low-delay sound-encoding alternating between predictive encoding and transform encoding
CN1751338B (en) Method and apparatus for speech coding
US6687667B1 (en) Method for quantizing speech coder parameters
US9269364B2 (en) Audio encoding/decoding based on an efficient representation of auto-regressive coefficients
US7684978B2 (en) Apparatus and method for transcoding between CELP type codecs having different bandwidths
JP2000132194A (en) Signal encoding device and method therefor, and signal decoding device and method therefor
EP1035538B1 (en) Multimode quantizing of the prediction residual in a speech coder
US7295974B1 (en) Encoding in speech compression
KR20050007854A (en) Transcoder between two speech codecs having difference CELP type and method thereof
EP1212750A1 (en) Multimode vselp speech coder

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: DILITHIUM (ASSIGNMENT FOR THE BENEFIT OF CREDITORS

Free format text: FORMER OWNER: DILITHIUM NETWORKS INC.

Effective date: 20130221

Owner name: ONMOBILE GLOBAL LTD.

Free format text: FORMER OWNER: DILITHIUM (ASSIGNMENT FOR THE BENEFIT OF CREDITORS) INC.

Effective date: 20130221

C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20130221

Address after: bangalore

Patentee after: DILITHIUM NETWORKS, Inc.

Address before: California, USA

Patentee before: Di Lee Sim (for the benefit of creditors) Ltd.

Effective date of registration: 20130221

Address after: California, USA

Patentee after: Di Lee Sim (for the benefit of creditors) Ltd.

Address before: California, USA

Patentee before: Di Lee Sim Network Inc.

Effective date of registration: 20130221

Address after: California, USA

Patentee after: Di Lee Sim Network Inc.

Address before: New South Wales

Patentee before: DILITHIUM NETWORKS Pty Ltd.

CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20090812

Termination date: 20150108

EXPY Termination of patent right or utility model