CN104658539A - Transcoding method for code stream of voice coder - Google Patents

Transcoding method for code stream of voice coder Download PDF

Info

Publication number
CN104658539A
CN104658539A CN201310598532.5A CN201310598532A CN104658539A CN 104658539 A CN104658539 A CN 104658539A CN 201310598532 A CN201310598532 A CN 201310598532A CN 104658539 A CN104658539 A CN 104658539A
Authority
CN
China
Prior art keywords
mrow
msub
msubsup
munderover
alpha
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310598532.5A
Other languages
Chinese (zh)
Inventor
盖丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian You Jia Software Science And Technology Ltd
Original Assignee
Dalian You Jia Software Science And Technology Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian You Jia Software Science And Technology Ltd filed Critical Dalian You Jia Software Science And Technology Ltd
Priority to CN201310598532.5A priority Critical patent/CN104658539A/en
Publication of CN104658539A publication Critical patent/CN104658539A/en
Pending legal-status Critical Current

Links

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a transcoding method for a code stream of a voice coder, and belongs to the technical field of voice coding and decoding. A code stream A transmitted by a communication network 1 passes through a bit stream parsing unit, a decoding unit, a parametric switch unit, a coding unit and a bit stream package unit to obtain a code stream B received by a communication network 2, and the communication networks 1 and 2 employ different speech coding standards.

Description

Transcoding method of code stream of voice encoder
Technical Field
The invention relates to a transcoding method of a code stream of a voice encoder, belonging to the technical field of voice encoding and decoding.
Background
Different communication networks often use different speech coding standards. In order to ensure interoperability between communication networks, it is often necessary to "transcode" between different encoders when connecting between communication networks. The communication network 1 uses a type a speech codec, and the communication network 2 uses a type B speech codec. The conventional voice transcoding method is to transcode in a manner of decoding first and then encoding (DTE), i.e. a type a voice decoder used in the communication network 1 decodes a received bit stream to obtain a time domain voice signal, then a type B voice encoder used in the communication network 2 encodes the time domain voice signal, and sends the encoded bit stream to the communication network 2. The transcoding method has high computational complexity, large time delay and large required memory space, and the quality of synthesized voice is low due to two times of encoding and decoding.
Disclosure of Invention
The invention aims at the proposal of the problems and develops a transcoding method of a code stream of a voice coder.
A transcoding method of a code stream of a voice encoder is characterized in that: a code stream A sent by the communication network 1 passes through a bit stream analysis unit, a decoding unit, a parameter conversion unit, an encoding unit and a bit stream encapsulation unit to obtain a code stream B received by the communication network 2, wherein the communication networks 1 and 2 are communication networks using different voice coding standards, such as a wireless network using an AMR standard and an IP network using a G.729AB standard.
The technical scheme of the invention has the following beneficial effects:
(1) when transcoding the line spectrum pair coefficient, a large amount of voice data is trained by using a Support Vector Regression (SVR) algorithm in advance, so that a mapping model of the line spectrum pair coefficient of the transmitting end and the line spectrum pair coefficient of the receiving end is obtained. On the basis, the mapping from the input line spectrum pair coefficient to the output line spectrum pair coefficient is carried out, so that the conversion of the line spectrum pair coefficient is more accurate, and the quality of the synthesized voice is improved.
(2) The decoded pitch lag integer part T0 is used as the open-loop search result of the encoding end, so that when the closed-loop search is performed, the closed-loop search range can be limited according to the value of T0, thereby improving the quality of the synthesized voice and reducing the calculation amount.
(3) In the process of transcoding the silence insertion description frame, a method of energy parameter direct mapping is adopted, and the calculation of the energy of the silence insertion description frame is eliminated, so that the algorithm complexity is reduced, and the storage capacity is correspondingly reduced.
(4) The frame type information is extracted from the input bit stream, so that the frame type is not judged in the transcoding process, the frame type is directly converted into the frame type which is the same as the received frame type when the bit stream is output, and the synthetic voice quality of a receiving end is effectively improved.
Drawings
FIG. 1 is a flow chart of the present invention.
Fig. 2 is a flowchart of a voice frame transcoding method of the present invention.
Fig. 3 is a flowchart of a method for transcoding parameters of a silence insertion description frame according to the present invention.
The DTE method of AMR to G.729AB transcoding in FIG. 4 is compared with the PESQ of the transcoding method of the present invention.
The DTE method of AMR to G.729AB transcoding in FIG. 5 is compared with WMOPS of the transcoding method of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings:
as shown in fig. 1: a code stream A sent by a communication network 1 passes through a bit stream analysis unit, a decoding unit, a parameter conversion unit, an encoding unit and a bit stream packaging unit to obtain a code stream B received by a communication network 2, wherein the communication networks 1 and 2 are communication networks using different voice coding standards.
Here, a specific implementation process of the present invention is described by taking a parameter transcoding process from AMR to g.729ab as an example, that is, the a coding standard is AMR, the B coding standard is g.729ab, the communication network 1 is a wireless communication network, and the communication network 2 is an IP network. The AMR frame length is 20ms, the G.729AB frame length is 10ms, the subframe length of the two frames is 5ms, and one AMR frame corresponds to two G.729AB frames. The specific transcoding scheme is as follows:
the bit stream analyzing unit is used for receiving AMR code streams sent by a wireless communication network, and comprises the following specific steps:
(1) according to the frame structure of AMR, the frame type (SPECH _ GOOD, SPECH _ BAD, SID _ FIRST, SID _ UPDATE, SID _ BAD, NO _ DATA), mode information (MR _4.75kbps, MR _5.15kbps, MR _5.9kbps, MR _6.7kbps, MR _7.4kbps, MR _10.2kbps, MR _12.2kbps) and parameter bits are extracted in sequence from the received AMR code stream.
(2) According to the frame structure of AMR, the parameter bits are converted into the parameter values after quantization coding, namely the line spectrum pair coefficient, the pitch delay, the nonzero pulse position and the sign and the gain of the fixed codebook of the speech frame, or the line spectrum pair coefficient and the speech energy of the silence insertion description frame.
(3) Judging the current frame as a SPEECH frame (SPEECH _ GOOD, SPEECH _ BAD), silence insertion description frame (SID _ UPDATE, SID _ BAD) or non-transmission frame (SID _ FIRST, NO _ DATA) according to the frame type information
The decoding unit is used for decoding the parameter bits by the AMR decoder to obtain the voice parameter value and the synthesized voice, and comprises the following specific steps:
(1) if the current frame is a speech frame:
decoding the parameter values after the quantization coding by using an AMR decoder to obtain voice parameters, wherein the voice parameters comprise line spectrum pair coefficients, pitch delay, non-zero pulse positions and symbols of a fixed codebook, self-adaptive codebook gain and fixed codebook gain; speech reconstruction is performed from the above speech parameters using an AMR decoder to obtain reconstructed speech s' (n).
(2) If the current frame is a silence insertion description frame:
the AMR decoder is used for decoding the parameter values after quantization coding so as to obtain silence, and the silence is inserted into line spectrum pair coefficients and voice energy of the description frame.
The parameter conversion unit is used for transcoding the voice parameters obtained by AMR decoding to obtain the voice parameters required by G.729AB quantization coding, and the specific steps are as follows:
(1) if the received AMR frame type is a SPEECH frame (SPEECH _ GOOD or SPEECH _ BAD), the transcoding procedure is as shown in fig. 2:
(a) linear predictive analysis:
transcoding of line spectrum to coefficients involves off-line mapping model parameter acquisition and on-line parameter mapping.
The process of obtaining the mapping model parameters is that firstly, a large amount (more than 10 hours), various types (such as adult male voice, adult female voice, boy voice, girl voice and the like) and various languages (such as Chinese, English, French and the like) of voice data are respectively encoded by AMR and G.729AB encoders to respectively obtain K groups and 2K groups of quantized line spectrum pair coefficients: LSPAMR(k, i) and LSPG.729AB(2K, i), i =1, …, n, K =1, …, K, where n is the dimension of the line spectrum to the coefficient vector. Calculating LSP by using support vector regression algorithmAMRAnd LSPG.729ABThe mapping model between: LSPG.729AB_1(i)=LSPG.729AB_2(i)=wi TLSPAMR(i)+biParameter w ofi、biWherein, LSPG.729AB_1(i)、LSPG.729AB_2(i) The line spectral pair coefficients, i =1, …, n, of the two frames of g.729ab, respectively, corresponding to the AMR current frame.
When transcoding, the AMR line spectrum can be used for coefficient LSPAMR(i) Using n mapping models:
LSPG.729AB_1(i)=LSPG.729AB_2(i)=wi TLSPAMR(i)+bii =1, …, n, respectively calculating LSPsG.729AB(i),i=1,…,n。
Computing LSPs using support vector regression algorithmsAMRAnd LSPG.729ABThe mapping model parameter w betweeni、biThe specific process is as follows:
line spectral pair coefficients LSP defining the speech in the k frame (G.729AB 2k frame)AMRAnd LSPG.729ABTraining data x and y, respectively, i.e. x (k, i) = LSPAMR(k,i),y(k,i)=LSPG.729AB(2k, i) using n regression functions fi(x)=wi Tx+biFitting data { x (K, i), y (K, i) }, K =1, …, K, i =1, …, n.
Defining n mapping functions <math> <mrow> <msub> <mi>f</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>=</mo> <msup> <msub> <mi>w</mi> <mi>i</mi> </msub> <mi>T</mi> </msup> <mi>x</mi> <mo>+</mo> <msub> <mi>b</mi> <mi>i</mi> </msub> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </munderover> <mrow> <mo>(</mo> <msubsup> <mi>&alpha;</mi> <mi>k</mi> <mo>*</mo> </msubsup> <mo>-</mo> <msub> <mi>&alpha;</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>k</mi> </msub> <mi>x</mi> <mo>)</mo> </mrow> <mo>+</mo> <msup> <msub> <mi>b</mi> <mi>i</mi> </msub> <mo>*</mo> </msup> <mo>,</mo> </mrow> </math>
<math> <mrow> <msub> <mi>w</mi> <mi>i</mi> </msub> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </munderover> <mrow> <mo>(</mo> <msubsup> <mi>&alpha;</mi> <mi>ki</mi> <mo>*</mo> </msubsup> <mo>-</mo> <msub> <mi>&alpha;</mi> <mi>ki</mi> </msub> <mo>)</mo> </mrow> <msub> <mi>x</mi> <mi>ki</mi> </msub> <mo>,</mo> <msubsup> <mi>b</mi> <mi>i</mi> <mo>*</mo> </msubsup> <mo>=</mo> <mfrac> <mn>1</mn> <mi>N</mi> </mfrac> <munder> <mi>&Sigma;</mi> <mrow> <mi>j</mi> <mo>&Element;</mo> <mo>{</mo> <mi>j</mi> <mo>|</mo> <msub> <mi>&alpha;</mi> <mi>ji</mi> </msub> <mo>></mo> <mn>0</mn> <mo>}</mo> </mrow> </munder> <mo>[</mo> <msub> <mi>y</mi> <mi>ji</mi> </msub> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </munderover> <msub> <mi>y</mi> <mi>ki</mi> </msub> <msub> <mi>&alpha;</mi> <mi>ki</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>ki</mi> </msub> <mo>&CenterDot;</mo> <msub> <mi>x</mi> <mi>ji</mi> </msub> <mo>)</mo> </mrow> <mo>]</mo> <mo>,</mo> </mrow> </math>
Wherein, akiAnd aki *Is the lagrange factor. For a given i, the solution for the lagrangian factor is:
defining the Lagrange function:
<math> <mrow> <mfenced open='' close=''> <mtable> <mtr> <mtd> <mi>G</mi> <mrow> <mo>(</mo> <msub> <mi>w</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>&zeta;</mi> <mi>ki</mi> </msub> <mo>,</mo> <msubsup> <mi>&zeta;</mi> <mi>ki</mi> <mo>*</mo> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <msup> <mrow> <mo>|</mo> <mo>|</mo> <mi>w</mi> <mo>|</mo> <mo>|</mo> </mrow> <mn>2</mn> </msup> <mo>+</mo> <mi>C</mi> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>r</mi> </munderover> <mrow> <mo>(</mo> <msub> <mi>&zeta;</mi> <mi>ki</mi> </msub> <mo>+</mo> <msubsup> <mi>&zeta;</mi> <mi>ki</mi> <mo>*</mo> </msubsup> <mo>)</mo> </mrow> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>r</mi> </munderover> <msub> <mi>&alpha;</mi> <mi>ki</mi> </msub> <mrow> <mo>(</mo> <mi>&epsiv;</mi> <mo>+</mo> <msub> <mi>&zeta;</mi> <mi>ki</mi> </msub> <mo>+</mo> <msub> <mi>y</mi> <mi>ki</mi> </msub> <mo>-</mo> <msub> <mi>w</mi> <mi>i</mi> </msub> <mo>&CenterDot;</mo> <msub> <mi>x</mi> <mi>ki</mi> </msub> <mo>-</mo> <msub> <mi>b</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>r</mi> </munderover> <msubsup> <mi>&alpha;</mi> <mi>ki</mi> <mo>*</mo> </msubsup> <mrow> <mo>(</mo> <mi>&epsiv;</mi> <mo>+</mo> <msubsup> <mi>&zeta;</mi> <mi>ki</mi> <mo>*</mo> </msubsup> <mo>+</mo> <msub> <mi>y</mi> <mi>ki</mi> </msub> <mo>-</mo> <msub> <mi>w</mi> <mi>i</mi> </msub> <mo>&CenterDot;</mo> <msub> <mi>x</mi> <mi>ki</mi> </msub> <mo>-</mo> <msub> <mi>b</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>r</mi> </munderover> <mrow> <mo>(</mo> <msub> <mi>&eta;</mi> <mi>ki</mi> </msub> <msub> <mi>&zeta;</mi> <mi>ki</mi> </msub> <mo>+</mo> <msubsup> <mi>&eta;</mi> <mi>ki</mi> <mo>*</mo> </msubsup> <msubsup> <mi>&zeta;</mi> <mi>ki</mi> <mo>*</mo> </msubsup> <mo>)</mo> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> </mrow> </math>
wherein C isiIs constant and Ci>0,ζkiNot less than 0 andis the relaxation factor and is the fitting accuracy. Maximizing the objective function:
<math> <mrow> <mi>W</mi> <mrow> <mo>(</mo> <msub> <mi>&alpha;</mi> <mi>ki</mi> </msub> <mo>,</mo> <msubsup> <mi>&alpha;</mi> <mi>ki</mi> <mo>*</mo> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <mi>&epsiv;</mi> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </munderover> <mrow> <mo>(</mo> <msubsup> <mi>&alpha;</mi> <mi>ki</mi> <mo>*</mo> </msubsup> <mo>+</mo> <msub> <mi>&alpha;</mi> <mi>ki</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </munderover> <msub> <mi>y</mi> <mi>ki</mi> </msub> <mrow> <mo>(</mo> <msubsup> <mi>&alpha;</mi> <mi>ki</mi> <mo>*</mo> </msubsup> <mo>-</mo> <msub> <mi>&alpha;</mi> <mi>ki</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </munderover> <mrow> <mo>(</mo> <msubsup> <mi>&alpha;</mi> <mi>ki</mi> <mo>*</mo> </msubsup> <mo>-</mo> <msub> <mi>&alpha;</mi> <mi>ki</mi> </msub> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <msubsup> <mi>&alpha;</mi> <mi>ji</mi> <mo>*</mo> </msubsup> <mo>-</mo> <msub> <mi>&alpha;</mi> <mi>ji</mi> </msub> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>ki</mi> </msub> <mo>&CenterDot;</mo> <msub> <mi>x</mi> <mi>ji</mi> </msub> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math>
wherein the Lagrange factor akiAnd aki *Satisfy the requirement of <math> <mrow> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </munderover> <mrow> <mo>(</mo> <msub> <mi>&alpha;</mi> <mi>ki</mi> </msub> <mo>-</mo> <msubsup> <mi>&alpha;</mi> <mi>ki</mi> <mo>*</mo> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <mn>0,0</mn> <mo>&le;</mo> <msub> <mi>&alpha;</mi> <mi>ki</mi> </msub> <mo>,</mo> <msubsup> <mi>&alpha;</mi> <mi>ki</mi> <mo>*</mo> </msubsup> <mo>&le;</mo> <mi>C</mi> <mo>,</mo> <mi>k</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mi>K</mi> <mo>,</mo> </mrow> </math> Constitutes a typical quadratic programming problem. Using the KKT conditions: alpha is alphaki=0→ykif(xki)≥1;0<αki<C→ykif(xki)=1;αi=C→ykif(xki) And (5) solving the quadratic programming problem by using a sequence minimum optimization algorithm, wherein the quadratic programming problem is less than or equal to 1. The sequence minimum optimization algorithm comprises the following steps:
(1) given an initial value of the Lagrangian factor, typically take alphaki=0;
(2) Calculating KKT conditions for the training data, finding data points (x) that violate the KKT conditions1i,y1i) Corresponding Lagrange factor alpha1iIt is taken as one of two Lagrangian factors to be optimized;
(3) finding satisfaction of max | f in training datai(x1i)–fi(x2i)+y2i-y1iData point of | (x)2i,y2i) Corresponding Lagrange factor as alpha2i. Lagrange factor alpha1iAnd alpha2iAfter the selection is finished, other Lagrange factors are kept unchanged to form a minimum-scale quadratic programming problem, namely, the optimal quadratic programming problem is solvedAnd
(4) solving the minimum quadratic programming problem:
K 11 = x 1 i 2 ,
K 22 = x 2 i 2 ,
K 12 = x 1 i x 2 i ,
<math> <mrow> <msubsup> <mi>&alpha;</mi> <mrow> <mn>2</mn> <mi>i</mi> </mrow> <mi>new</mi> </msubsup> <mo>=</mo> <msubsup> <mi>&alpha;</mi> <mrow> <mn>2</mn> <mi>i</mi> </mrow> <mi>old</mi> </msubsup> <mo>-</mo> <mfrac> <mrow> <msub> <mi>y</mi> <mrow> <mn>2</mn> <mi>i</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>E</mi> <mn>1</mn> </msub> <mo>-</mo> <msub> <mi>E</mi> <mn>2</mn> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <mn>2</mn> <msub> <mi>K</mi> <mn>12</mn> </msub> <mo>-</mo> <msub> <mi>K</mi> <mn>11</mn> </msub> <mo>-</mo> <msub> <mi>K</mi> <mn>22</mn> </msub> </mrow> </mfrac> <mo>,</mo> </mrow> </math>
wherein Eki=fi old(xki)–ykiTraining errors.
When y is1i≠y2iWhen the temperature of the water is higher than the set temperature, <math> <mrow> <mi>L</mi> <mo>=</mo> <mi>max</mi> <mrow> <mo>(</mo> <mn>0</mn> <mo>,</mo> <msubsup> <mi>&alpha;</mi> <mrow> <mn>2</mn> <mi>i</mi> </mrow> <mi>old</mi> </msubsup> <mo>-</mo> <msubsup> <mi>&alpha;</mi> <mrow> <mn>1</mn> <mi>i</mi> </mrow> <mi>old</mi> </msubsup> <mo>)</mo> </mrow> <mo>,</mo> <mi>H</mi> <mo>=</mo> <mi>min</mi> <mrow> <mo>(</mo> <mi>C</mi> <mo>,</mo> <mi>C</mi> <mo>+</mo> <msubsup> <mi>&alpha;</mi> <mrow> <mn>2</mn> <mi>i</mi> </mrow> <mi>old</mi> </msubsup> <mo>-</mo> <msubsup> <mi>&alpha;</mi> <mrow> <mn>1</mn> <mi>i</mi> </mrow> <mi>old</mi> </msubsup> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math>
when y is1i=y2iWhen the temperature of the water is higher than the set temperature, <math> <mrow> <mi>L</mi> <mo>=</mo> <mi>max</mi> <mrow> <mo>(</mo> <mn>0</mn> <mo>,</mo> <msubsup> <mi>&alpha;</mi> <mrow> <mn>1</mn> <mi>i</mi> </mrow> <mi>old</mi> </msubsup> <mo>+</mo> <msubsup> <mi>&alpha;</mi> <mrow> <mn>2</mn> <mi>i</mi> </mrow> <mi>old</mi> </msubsup> <mo>-</mo> <mi>C</mi> <mo>)</mo> </mrow> <mo>,</mo> <mi>H</mi> <mo>=</mo> <mi>min</mi> <mrow> <mo>(</mo> <mi>C</mi> <mo>,</mo> <msubsup> <mi>&alpha;</mi> <mrow> <mn>1</mn> <mi>i</mi> </mrow> <mi>old</mi> </msubsup> <mo>+</mo> <msubsup> <mi>&alpha;</mi> <mrow> <mn>2</mn> <mi>i</mi> </mrow> <mi>old</mi> </msubsup> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math>
<math> <mrow> <msubsup> <mi>&alpha;</mi> <mrow> <mn>2</mn> <mi>i</mi> </mrow> <mrow> <mi>new</mi> <mo>,</mo> <mi>clipped</mi> </mrow> </msubsup> <mo>=</mo> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <mi>H</mi> <mo>,</mo> </mtd> <mtd> <msubsup> <mi>&alpha;</mi> <mrow> <mn>2</mn> <mi>i</mi> </mrow> <mi>new</mi> </msubsup> <mo>&GreaterEqual;</mo> <mi>H</mi> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>&alpha;</mi> <mrow> <mn>2</mn> <mi>i</mi> </mrow> <mi>new</mi> </msubsup> <mo>,</mo> </mtd> <mtd> <mi>L</mi> <mo>&lt;</mo> <msubsup> <mi>&alpha;</mi> <mrow> <mn>2</mn> <mi>i</mi> </mrow> <mi>new</mi> </msubsup> <mo>&lt;</mo> <mi>H</mi> </mtd> </mtr> <mtr> <mtd> <mi>L</mi> <mo>,</mo> </mtd> <mtd> <msubsup> <mi>&alpha;</mi> <mrow> <mn>2</mn> <mi>i</mi> </mrow> <mi>new</mi> </msubsup> <mo>&le;</mo> <mi>L</mi> </mtd> </mtr> </mtable> </mfenced> <mo>;</mo> </mrow> </math>
<math> <mrow> <msubsup> <mi>&alpha;</mi> <mrow> <mn>1</mn> <mi>i</mi> </mrow> <mi>new</mi> </msubsup> <mo>=</mo> <msubsup> <mi>&alpha;</mi> <mrow> <mn>1</mn> <mi>i</mi> </mrow> <mi>old</mi> </msubsup> <mo>+</mo> <msub> <mi>y</mi> <mrow> <mn>1</mn> <mi>i</mi> </mrow> </msub> <msub> <mi>y</mi> <mrow> <mn>2</mn> <mi>i</mi> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>&alpha;</mi> <mrow> <mn>2</mn> <mi>i</mi> </mrow> <mi>old</mi> </msubsup> <mo>-</mo> <msubsup> <mi>&alpha;</mi> <mrow> <mn>2</mn> <mi>i</mi> </mrow> <mrow> <mi>new</mi> <mo>,</mo> <mi>clipped</mi> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>
obtaining a pair of new Lagrange factorsAnd
(5) checking whether a data point violating the KKT condition exists, and if yes, returning to the step 2); otherwise, obtaining the optimal solution of the whole problem and carrying out the next step.
(6) Obtaining a regression function:
<math> <mrow> <msub> <mi>f</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>=</mo> <msup> <msub> <mi>w</mi> <mi>i</mi> </msub> <mi>T</mi> </msup> <mi>x</mi> <mo>+</mo> <msub> <mi>b</mi> <mi>i</mi> </msub> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </munderover> <mrow> <mo>(</mo> <msubsup> <mi>&alpha;</mi> <mi>ki</mi> <mo>*</mo> </msubsup> <mo>-</mo> <msub> <mi>&alpha;</mi> <mi>ki</mi> </msub> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>ki</mi> </msub> <mi>x</mi> <mo>)</mo> </mrow> <mo>+</mo> <msubsup> <mi>b</mi> <mi>i</mi> <mo>*</mo> </msubsup> <mo>;</mo> </mrow> </math>
wherein: <math> <mrow> <msubsup> <mi>b</mi> <mi>i</mi> <mo>*</mo> </msubsup> <mo>=</mo> <mfrac> <mn>1</mn> <mi>N</mi> </mfrac> <munder> <mi>&Sigma;</mi> <mrow> <mi>j</mi> <mo>&Element;</mo> <mo>{</mo> <mi>j</mi> <mo>|</mo> <msub> <mi>&alpha;</mi> <mi>ji</mi> </msub> <mo>></mo> <mn>0</mn> <mo>}</mo> </mrow> </munder> <mo>[</mo> <msub> <mi>y</mi> <mi>ji</mi> </msub> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </munderover> <msub> <mi>y</mi> <mi>ki</mi> </msub> <msub> <mi>&alpha;</mi> <mi>ki</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>ki</mi> </msub> <mo>&CenterDot;</mo> <msub> <mi>x</mi> <mi>ji</mi> </msub> <mo>)</mo> </mrow> <mo>]</mo> <mo>,</mo> </mrow> </math> n is the number of support vectors.
The line spectrum pair coefficient of the AMR is mapped by the mapping relation obtained above to obtain the line spectrum pair coefficient of the corresponding frame G.729AB, namely
LSPG.729AB_1=LSPG.729AB_2=f(LSPAMR);
This was taken as the g.729ab unquantized line spectrum pair coefficient.
And converting the obtained unquantized line spectrum pair coefficient into a line spectrum frequency coefficient, quantizing and coding according to a G.729AB coding standard, and then transmitting the quantized and coded line spectrum frequency coefficient to the IP network.
According to G.729AB coding standard, a group of line spectrum pair coefficients obtained by mapping and line spectrum pair coefficients of previous frame or several frames are interpolated to obtain unquantized line spectrum pair coefficients of every sub-frame, thenThe unquantized linear prediction coefficient A (z) of each subframe is calculated from the unquantized line spectrum pair coefficient of each subframe. And quantizing the group of line spectrum pair coefficients obtained by mapping to obtain a group of quantized line spectrum pair coefficients of the current frame, interpolating the quantized line spectrum pair coefficients of the current frame and the quantized line spectrum pair coefficients of the previous frame or a plurality of frames to obtain the quantized line spectrum pair coefficients of each subframe, and calculating the quantized linear prediction coefficients A' (z) of each subframe according to the quantized line spectrum pair coefficients of each subframe. The unquantized and quantized linear prediction coefficients are used to calculate the perceptual weighting filter w (z) = a (z/gamma), respectively1)/A(z/γ2)(γ1And gamma2Perceptual weighting coefficients) and the coefficients of the synthesis filter 1/a' (z).
(b) Open-loop pitch search:
in a speech coding algorithm based on code excited linear prediction, pitch search is done in two steps. The first step is an open-loop pitch search, where the pitch period is estimated approximately, denoted as T _ op, in order to provide a coarse range for the closed-loop pitch search to reduce the amount of computation for the closed-loop pitch search. The second step is to perform a closed loop pitch search around T _ op.
In transcoding, the normal open-loop pitch search process is omitted, and the decoded pitch-delayed integer part T0 is used directly as the open-loop pitch search result T _ op encoded in the g.729ab encoding standard:
T_opG.729AB_1=T_opG.729AB_2=T0AMR
wherein, T _ opG.729AB_1、T_opG.729AB_2Respectively, representing the open-loop pitch lag of two frames of g.729ab corresponding to the AMR current frame.
(c) Target signal for computing impulse response and adaptive codebook search of perceptual weighted synthesis filter
Perceptual weighted synthesis filter h (z) = a (z/γ)1)/(A'(z)A(z/γ2) Impulse response h (n) for adaptive codebook and fixed codebook search, typically once per subframe. Punching machineThe laser signal passes through a filter A (z/gamma)1) Then, the mixture is successively passed through 1/A' (z) and 1/A (z/gamma)2) To obtain h (n).
The calculation process of the target signal x (n) of the adaptive codebook search is as follows: first, the residual signal res of the linear prediction filter is calculatedLP(n) the calculation formula is:
<math> <mrow> <msub> <mi>res</mi> <mi>LP</mi> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <msup> <mi>s</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>+</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>P</mi> </munderover> <msub> <mover> <mi>a</mi> <mo>^</mo> </mover> <mi>i</mi> </msub> <msup> <mi>s</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>
where s' (n) is the reconstructed speech resulting from the decoding,for quantized linear prediction coefficients, P is the order of the linear prediction filter. Then the residual signal resLP(n) pass through a perceptually weighted synthesis filter H (z), resLP(n) convolving with h (n) to obtain a target signal x (n):
x(n)=resLP(n)*h(n);
(d) adaptive codebook search
The adaptive codebook search includes a closed loop pitch search and a calculation of an adaptive codebook vector.
The criterion for closed-loop pitch search is to minimize the mean square error between the reconstructed speech at the decoding end and the reconstructed speech at the encoding end, even if r (k) is maximum:
<math> <mrow> <mi>R</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>len</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mi>x</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <msub> <mi>y</mi> <mi>k</mi> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> </mrow> <msqrt> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>len</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>y</mi> <mi>k</mi> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <msub> <mi>y</mi> <mi>k</mi> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> </msqrt> </mfrac> <mo>,</mo> </mrow> </math>
wherein x (n) is a target signal, yk(n) is the past filtered excitation at delay k (past excitation convolved with h (n)), and len is the subframe length.
When the closed-loop pitch search is performed, the search range is limited to be around a preselected value T _ op, and the range of the closed-loop pitch search is determined according to the value of the integer pitch delay T0 obtained by decoding:
[T0-g(T0),T0+g(T0)],
wherein, <math> <mrow> <mi>g</mi> <mrow> <mo>(</mo> <mi>T</mi> <mn>0</mn> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <mn>3</mn> <mo>,</mo> </mtd> <mtd> <mn>20</mn> <mo>&le;</mo> <mi>T</mi> <mn>0</mn> <mo>&le;</mo> <mn>39</mn> </mtd> </mtr> <mtr> <mtd> <mn>2</mn> <mo>,</mo> </mtd> <mtd> <mn>40</mn> <mo>&le;</mo> <mi>T</mi> <mn>0</mn> <mo>&le;</mo> <mn>79</mn> </mtd> </mtr> <mtr> <mtd> <mn>1</mn> <mo>,</mo> </mtd> <mtd> <mn>80</mn> <mo>&le;</mo> <mi>T</mi> <mn>0</mn> <mo>&le;</mo> <mn>143</mn> </mtd> </mtr> </mtable> </mfenced> <mo>.</mo> </mrow> </math>
and performing closed-loop pitch search within a limited range to obtain the optimal integer pitch delay k. If the resolution of k is in the range of fractional delay according to the coding standard of the receiving end, the fraction near the optimal integer delay is tested. The normalized correlation coefficient r (k) is interpolated and its maximum value is searched for a fractional pitch period.
<math> <mrow> <mi>R</mi> <msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mi>t</mi> </msub> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>&epsiv;</mi> </munderover> <mi>R</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>-</mo> <mi>i</mi> <mo>)</mo> </mrow> <msub> <mi>b</mi> <mi>m</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>-</mo> <mi>i</mi> <mo>&CenterDot;</mo> <mi>&epsiv;</mi> <mo>)</mo> </mrow> <mo>+</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>&epsiv;</mi> </munderover> <mi>R</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>+</mo> <mn>1</mn> <mo>+</mo> <mi>i</mi> <mo>)</mo> </mrow> <msub> <mi>b</mi> <mi>m</mi> </msub> <mrow> <mo>(</mo> <mi>&epsiv;</mi> <mo>-</mo> <mi>t</mi> <mo>+</mo> <mi>i</mi> <mo>&CenterDot;</mo> <mi>&epsiv;</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>
Where t =0,1, …, -1, b is the reciprocal of the fractional delay resolutionmAre the interpolation filter coefficients.
After pitch lag determination, the adaptive codebook vector is computed by interpolating the past excitation u (n) at a given integer delay k and fractional delay t:
<math> <mrow> <mi>v</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>P</mi> </munderover> <mi>u</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mi>k</mi> <mo>+</mo> <mi>i</mi> <mo>)</mo> </mrow> <msub> <mi>b</mi> <mi>q</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>+</mo> <mi>i</mi> <mo>&CenterDot;</mo> <mi>&epsiv;</mi> <mo>)</mo> </mrow> <mo>+</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>P</mi> </munderover> <mi>u</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mi>k</mi> <mo>+</mo> <mn>1</mn> <mo>+</mo> <mi>i</mi> <mo>)</mo> </mrow> <msub> <mi>b</mi> <mi>q</mi> </msub> <mrow> <mo>(</mo> <mi>&epsiv;</mi> <mo>-</mo> <mi>t</mi> <mo>-</mo> <mi>i</mi> <mo>&CenterDot;</mo> <mi>&epsiv;</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math>
after the adaptive codebook is determined, the gain g of the adaptive codebook may be calculatedp
<math> <mrow> <msub> <mi>g</mi> <mi>p</mi> </msub> <mo>=</mo> <mfrac> <mrow> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>len</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mi>x</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mi>y</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> </mrow> <mrow> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>len</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mi>y</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mi>y</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>;</mo> </mrow> </math>
Where len is the sub-frame length, x (n) is the target signal of adaptive codebook search, and the convolution y (n) of v (n) and h (n) is the adaptive codebook vector filtered signal, i.e. y (n) = v (n) × h (n), where h (n) is the impulse response of the perceptual weighted synthesis filter h (z).
(e) Fixed codebook search
The fixed codebook vector can be expressed as:
<math> <mrow> <mi>c</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>S</mi> <mn>1</mn> </msub> <mi>&delta;</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <msub> <mi>m</mi> <mn>1</mn> </msub> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>S</mi> <mn>2</mn> </msub> <mi>&delta;</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <msub> <mi>m</mi> <mn>2</mn> </msub> <mo>)</mo> </mrow> <mo>+</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>+</mo> <msub> <mi>S</mi> <msub> <mi>N</mi> <mi>p</mi> </msub> </msub> <mi>&delta;</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <msub> <mi>m</mi> <msub> <mi>N</mi> <mi>p</mi> </msub> </msub> <mo>)</mo> </mrow> <mo>,</mo> <mi>n</mi> <mo>=</mo> <mn>0,1</mn> <mo>,</mo> <mi>&Lambda;</mi> <mo>,</mo> <mi>len</mi> <mo>-</mo> <mn>1</mn> <mo>;</mo> </mrow> </math>
wherein (N) is a unit pulse, NpThe number of non-zero pulses in the fixed codebook vector, len is the subframe length. m is1,m2,...,Indicating the position of a non-zero pulse, S1,S2,...,The symbol (1 or-1) representing the non-zero pulse at the corresponding position, c (N) being a len-dimensional vector, divided by NpExcept for the non-zero pulse, all other elements are 0.
Fixed codebook search is with the reconstructed speech s 'of the decoding side weighted'w(n) and the weighted mean square error minimization criterion between the reconstructed speech at the encoding end to search the fixed codebook vector, i.e., to determine the location and sign of the non-zero pulse in the codebook vector.
The fixed codebook search is carried out by firstly calculating target signal
x2(n)=x(n)-gpy(n),n=0,1,Λ,len-1;
Wherein x (n) is the target signal of adaptive codebook search, y (n) = v (n) × h (n) is the adaptive codebook vector filtering signal, gpFor adaptive codebook gain, len subframe length.
If c is a codebook vector, then the codebook vector that maximizes the following equation is obtained:
wherein d is x2(n) and the impulse response h (n) of the perceptually weighted synthesis filter,is the autocorrelation matrix of h (n), and T represents the matrix transposition.
The elements of vector d are calculated as follows:
<math> <mrow> <mi>d</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>len</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>x</mi> <mn>2</mn> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mi>h</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>-</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>,</mo> <mi>n</mi> <mo>=</mo> <mn>0,1</mn> <mo>,</mo> <mi>&Lambda;</mi> <mo>,</mo> <mi>len</mi> <mo>-</mo> <mn>1</mn> </mrow> </math>
where len is the subframe length. The elements of the symmetric matrix are calculated as follows:
the term of the molecule concerned in formula (1) can be represented by the following formula:
<math> <mrow> <mi>C</mi> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <msub> <mi>N</mi> <mi>p</mi> </msub> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>S</mi> <mi>i</mi> </msub> <mi>d</mi> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </math>
wherein u isiIndicating the position of the ith pulse, SiSymbol representing the ith pulse, NpThe number of non-zero pulses in the fixed codebook vector. The denominator in formula (1) is given by:
<math> <mrow> <msub> <mi>E</mi> <mi>D</mi> </msub> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <msub> <mi>N</mi> <mi>p</mi> </msub> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mi>&phi;</mi> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>u</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <mn>2</mn> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <msub> <mi>N</mi> <mi>p</mi> </msub> <mo>-</mo> <mn>2</mn> </mrow> </munderover> <munderover> <mi>&Sigma;</mi> <mrow> <mi>j</mi> <mo>=</mo> <mi>i</mi> <mo>+</mo> <mn>1</mn> </mrow> <mrow> <msub> <mi>N</mi> <mi>p</mi> </msub> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>S</mi> <mi>i</mi> </msub> <msub> <mi>S</mi> <mi>j</mi> </msub> <mi>&phi;</mi> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>u</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> </math>
u maximizing formula (1)1,u2,...I.e. the desired non-zero pulse position.
Position m of fixed codebook vector obtainable by AMR decoding at transcoding1,m2,Λ,Limiting the search range of G.729AB fixed codebook search to make the search of fixed codebook in m1,m2,Λ,A simplified search is conducted near the location.
The codebook with fixed codebook gain is searched with minimum mean square weighted error between the reconstructed speech decoded by the AMR decoder and the reconstructed speech encoded by the g.729ab encoder, even if the following equation is minimized:
E = | | x - g p y - g c z | | 2 = x T x + g p 2 y T y + g c 2 z T z - 2 g p x T y - 2 g c x T z + 2 g p g c y T z
where x is the target vector of the fixed codebook search, y is the adaptive codebook vector filtered signal, and z is the convolution of the fixed codebook vector with h (n):
<math> <mrow> <mi>z</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>n</mi> </munderover> <mi>c</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <mi>h</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>,</mo> <mi>n</mi> <mo>=</mo> <mn>0</mn> <mo>,</mo> <mi>&Lambda;</mi> <mo>,</mo> <mi>len</mi> <mo>-</mo> <mn>1</mn> </mrow> </math>
where len is the subframe length.
(2) If the received AMR frame type is a silence insertion description frame (SID _ UPDATE, SID _ BAD), the transcoding flow chart is as shown in fig. 3.
(a) And taking the line spectrum pair coefficient of the decoded AMR as the line spectrum pair coefficient of the corresponding frame G.729AB:
LSPG.729AB_1=LSPG.729AB_2=LSPAMR
wherein, LSPG.729AB_1、LSPG.729AB_2Respectively, the line spectrum pair coefficients of two frames of g.729ab corresponding to the AMR current frame.
(b) And converting the speech energy parameter of the AMR obtained by decoding into an energy parameter of a corresponding frame G.729AB:
enerG.729AB_1=enerG.729AB_2=1.09enerAMR+981
wherein, the eneG.729AB_1、enerG.729AB_2Respectively, the energy parameters of two frames of g.729ab corresponding to the AMR current frame.
The coding unit is used for carrying out quantization coding on the obtained parameters, and the specific steps are as follows:
(1) if the current frame is a speech frame, the parameters comprise a line spectrum pair coefficient, a pitch delay, a non-zero pulse position and symbol of a fixed codebook, a self-adaptive codebook gain and a fixed codebook gain, and a G.729AB encoder is used for quantizing and encoding each parameter to obtain information bits.
(2) If the current frame is a silence insertion description frame, the parameters are line spectrum pair coefficients and voice energy, and the parameters are quantized and coded according to a G.729AB coding standard to obtain information bits.
The bit stream packaging unit is used for packaging and outputting the parameter bit, the mode information and the frame type, wherein the output frame type is assigned according to the received frame type. If the received AMR frame type is a SPEECH frame (SPEECH _ GOOD or SPEECH _ BAD), the G.729AB frame type is assigned as a SPEECH frame (RATE _ 8000); if the received AMR frame type is a silence insertion description frame (SID _ UPDATE or SID _ BAD), the G.729AB frame type is assigned as a silence insertion description frame (RATE _ SID); if the received AMR frame type is a non-transmission frame (NO _ DATA or SID _ FIRST), the G.729AB frame type is assigned as a non-transmission frame (RATE _ 0).
(1) When transcoding the line spectrum pair coefficient, a support vector regression algorithm is used in advance to train a large amount of voice data, so that a mapping model of the line spectrum pair coefficient of the sending end and the line spectrum pair coefficient of the receiving end is obtained. On the basis, the mapping from the input line spectrum pair coefficient to the output line spectrum pair coefficient is carried out, so that the conversion of the line spectrum pair coefficient is more accurate, and the quality of the synthesized voice is improved.
(2) The decoded pitch lag integer part T0 is used as the open-loop search result of the encoding end, so that when the closed-loop search is performed, the closed-loop search range can be limited according to the value of T0, thereby improving the quality of the synthesized voice and reducing the calculation amount.
(3) In the process of transcoding the silence insertion description frame, a method of energy parameter direct mapping is adopted, and the calculation of the energy of the silence insertion description frame is eliminated, so that the algorithm complexity is reduced, and the storage capacity is correspondingly reduced.
(4) The frame type information is extracted from the input bit stream, so that the frame type is not judged in the transcoding process, the frame type is directly converted into the frame type which is the same as the received frame type when the bit stream is output, and the synthetic voice quality of a receiving end is effectively improved.
Fig. 4 shows the voice quality results of the conventional DTE method and the parameter transcoding method provided by the present invention when AMR is transcoded to g.729ab. The method comprises the steps of taking bit streams obtained by coding AMR standard test sequences t10. pcm-t 19.pcm at various rates as input, converting the coded bit streams of each code rate into G.729AB code streams by using a DTE method and the method provided by the invention, decoding the code streams by using a G.729AB decoder to obtain synthetic voices, and performing PESQ objective voice quality evaluation on the synthetic voices and the standard test sequences (t 10. pcm-t 19. pcm) corresponding to the AMR.
Fig. 5 shows the comparison of the operation complexity results between the conventional DTE method and the parameter transcoding method provided by the present invention when AMR is transcoded to g.729ab. Multiple speech segments (speech file see fig. 4) are selected from the standard test sequence of AMR for WMOPS testing, and the worst case results for each mode are listed in the table.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims (6)

1. A transcoding method of a code stream of a voice encoder is characterized in that: a code stream A sent by a communication network 1 passes through a bit stream analysis unit, a decoding unit, a parameter conversion unit, an encoding unit and a bit stream packaging unit to obtain a code stream B received by a communication network 2, wherein the communication networks 1 and 2 are communication networks using different voice coding standards.
2. The method of claim 1, wherein the transcoding method comprises: the bitstream parsing unit is configured to receive an a code stream sent by the communication network 1, and includes the following specific steps:
(1) according to the frame structure of the a coding standard of the communication network 1, mode information, frame type information, and parameter bits are extracted from the corresponding bits of the input a code stream.
(2) Converting the parameter bits into parameter values after speech parameter quantization coding according to a frame structure of an A coding standard of the communication network 1, wherein the parameters of the speech frame comprise line spectrum pair coefficients, pitch delay, non-zero pulse positions, symbols and gains of a fixed codebook; parameters of silence insertion description frames are line spectrum pair coefficients and speech energy.
(3) And extracting frame type information from the A code stream, and judging whether the received frame type is a voice frame, a non-transmission frame or a silence insertion description frame.
3. The method of claim 1, wherein the transcoding method comprises: the decoding unit is used for decoding the parameter bits by the decoder A to obtain the voice parameter value and the synthesized voice, and comprises the following specific steps:
(1) and if the received frame type is a silence insertion description frame, decoding according to the received parameter index value to obtain a voice parameter value, wherein the parameters are a line spectrum pair coefficient and an energy ener.
(2) If the received frame type is a speech frame, then:
(a) decoding to obtain a voice parameter value according to the received parameter index value, wherein the parameters comprise line spectrum pair coefficients, integral part T0 and fractional part T0_ frac of pitch delay, non-zero pulse position and sign of fixed codebook, and quantized adaptive codebook gain g'pAnd quantized fixed codebook gain g'c,
(b) Based on the speech parameters, speech reconstruction is performed using the A coding standard of the communication network 1 to obtain reconstructed speech s' (n),
(c) after the reconstructed speech s' (n) is obtained, the post-processing in the a decoder is not performed.
4. The method of claim 1, wherein the transcoding method comprises: the parameter conversion unit is used for transcoding the decoded voice parameters to obtain the voice parameters required by the B coding standard quantization coding of the communication network 2, and comprises the following specific steps:
(1) if the received voice frame is a voice frame, the transcoding step is as follows:
(a) linear predictive analysis:
transcoding of line spectrum to coefficients involves off-line mapping model parameter acquisition and on-line parameter mapping,
firstly, respectively coding voice data of more than 10 hours, various types and various languages by an A, B coder to obtain K groups of quantized line spectrum pair coefficients, wherein the various types comprise adult male voice, adult female voice, boy voice and girl voice; the various languages include chinese, english, french, spanish, arabic: LSPA(k, i) and LSPB(K, i), K =1, …, K, i =1, …, n, where n is the dimension of the line spectrum to coefficient vector; calculating LSP by using support vector regression algorithmAAnd LSPBThe mapping model between: LSPB(i)=wi TLSPA(i)+biParameter w ofi、biI =1, …, n; when transcoding, the A encoder can use line spectrum to pair coefficient LSPA(i) Using n mapping models: LSPB(i)=wi TLSPA(i)+biI =1, …, n, respectively calculating LSPsB(i),i=1,…,n;
Computing LSPs using Support Vector Regression (SVR) algorithmsAAnd LSPBThe mapping model parameter w betweeni、biThe specific process is as follows:
defining line spectrum pair coefficient LSP of k frame speechAAnd LSPBTraining data x and y, respectively, i.e. x (k, i) = LSPA(k,i),y(k,i)=LSPB(k, i); using n regression functions fi(x)=wi Tx+biFitting data { x (K, i), y (K, i) }, K =1, …, K, i =1, …, n;
defining n mapping functions <math> <mrow> <msub> <mi>f</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>=</mo> <msup> <msub> <mi>w</mi> <mi>i</mi> </msub> <mi>T</mi> </msup> <mi>x</mi> <mo>+</mo> <msub> <mi>b</mi> <mi>i</mi> </msub> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </munderover> <mrow> <mo>(</mo> <msubsup> <mi>&alpha;</mi> <mi>k</mi> <mo>*</mo> </msubsup> <mo>-</mo> <msub> <mi>&alpha;</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>k</mi> </msub> <mi>x</mi> <mo>)</mo> </mrow> <mo>+</mo> <msup> <msub> <mi>b</mi> <mi>i</mi> </msub> <mo>*</mo> </msup> <mo>,</mo> </mrow> </math>
<math> <mrow> <msub> <mi>w</mi> <mi>i</mi> </msub> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </munderover> <mrow> <mo>(</mo> <msubsup> <mi>&alpha;</mi> <mi>ki</mi> <mo>*</mo> </msubsup> <mo>-</mo> <msub> <mi>&alpha;</mi> <mi>ki</mi> </msub> <mo>)</mo> </mrow> <msub> <mi>x</mi> <mi>ki</mi> </msub> <mo>,</mo> <msubsup> <mi>b</mi> <mi>i</mi> <mo>*</mo> </msubsup> <mo>=</mo> <mfrac> <mn>1</mn> <mi>N</mi> </mfrac> <munder> <mi>&Sigma;</mi> <mrow> <mi>j</mi> <mo>&Element;</mo> <mo>{</mo> <mi>j</mi> <mo>|</mo> <msub> <mi>&alpha;</mi> <mi>ji</mi> </msub> <mo>></mo> <mn>0</mn> <mo>}</mo> </mrow> </munder> <mo>[</mo> <msub> <mi>y</mi> <mi>ji</mi> </msub> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </munderover> <msub> <mi>y</mi> <mi>ki</mi> </msub> <msub> <mi>&alpha;</mi> <mi>ki</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>ki</mi> </msub> <mo>&CenterDot;</mo> <msub> <mi>x</mi> <mi>ji</mi> </msub> <mo>)</mo> </mrow> <mo>]</mo> <mo>,</mo> </mrow> </math>
Wherein, akiAnd aki *Is the lagrange factor; for a given i, the solution for the lagrangian factor is:
defining the Lagrange function:
<math> <mrow> <mfenced open='' close=''> <mtable> <mtr> <mtd> <mi>G</mi> <mrow> <mo>(</mo> <msub> <mi>w</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>&zeta;</mi> <mi>ki</mi> </msub> <mo>,</mo> <msubsup> <mi>&zeta;</mi> <mi>ki</mi> <mo>*</mo> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <msup> <mrow> <mo>|</mo> <mo>|</mo> <mi>w</mi> <mo>|</mo> <mo>|</mo> </mrow> <mn>2</mn> </msup> <mo>+</mo> <mi>C</mi> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>r</mi> </munderover> <mrow> <mo>(</mo> <msub> <mi>&zeta;</mi> <mi>ki</mi> </msub> <mo>+</mo> <msubsup> <mi>&zeta;</mi> <mi>ki</mi> <mo>*</mo> </msubsup> <mo>)</mo> </mrow> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>r</mi> </munderover> <msub> <mi>&alpha;</mi> <mi>ki</mi> </msub> <mrow> <mo>(</mo> <mi>&epsiv;</mi> <mo>+</mo> <msub> <mi>&zeta;</mi> <mi>ki</mi> </msub> <mo>+</mo> <msub> <mi>y</mi> <mi>ki</mi> </msub> <mo>-</mo> <msub> <mi>w</mi> <mi>i</mi> </msub> <mo>&CenterDot;</mo> <msub> <mi>x</mi> <mi>ki</mi> </msub> <mo>-</mo> <msub> <mi>b</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>r</mi> </munderover> <msubsup> <mi>&alpha;</mi> <mi>ki</mi> <mo>*</mo> </msubsup> <mrow> <mo>(</mo> <mi>&epsiv;</mi> <mo>+</mo> <msubsup> <mi>&zeta;</mi> <mi>ki</mi> <mo>*</mo> </msubsup> <mo>+</mo> <msub> <mi>y</mi> <mi>ki</mi> </msub> <mo>-</mo> <msub> <mi>w</mi> <mi>i</mi> </msub> <mo>&CenterDot;</mo> <msub> <mi>x</mi> <mi>ki</mi> </msub> <mo>-</mo> <msub> <mi>b</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>r</mi> </munderover> <mrow> <mo>(</mo> <msub> <mi>&eta;</mi> <mi>ki</mi> </msub> <msub> <mi>&zeta;</mi> <mi>ki</mi> </msub> <mo>+</mo> <msubsup> <mi>&eta;</mi> <mi>ki</mi> <mo>*</mo> </msubsup> <msubsup> <mi>&zeta;</mi> <mi>ki</mi> <mo>*</mo> </msubsup> <mo>)</mo> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> </mrow> </math>
wherein C isiIs constant and Ci>0,ζkiNot less than 0 andis relaxation factor, is fitting accuracy; maximizing the objective function:
<math> <mrow> <mi>W</mi> <mrow> <mo>(</mo> <msub> <mi>&alpha;</mi> <mi>ki</mi> </msub> <mo>,</mo> <msubsup> <mi>&alpha;</mi> <mi>ki</mi> <mo>*</mo> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <mi>&epsiv;</mi> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </munderover> <mrow> <mo>(</mo> <msubsup> <mi>&alpha;</mi> <mi>ki</mi> <mo>*</mo> </msubsup> <mo>+</mo> <msub> <mi>&alpha;</mi> <mi>ki</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </munderover> <msub> <mi>y</mi> <mi>ki</mi> </msub> <mrow> <mo>(</mo> <msubsup> <mi>&alpha;</mi> <mi>ki</mi> <mo>*</mo> </msubsup> <mo>-</mo> <msub> <mi>&alpha;</mi> <mi>ki</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </munderover> <mrow> <mo>(</mo> <msubsup> <mi>&alpha;</mi> <mi>ki</mi> <mo>*</mo> </msubsup> <mo>-</mo> <msub> <mi>&alpha;</mi> <mi>ki</mi> </msub> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <msubsup> <mi>&alpha;</mi> <mi>ji</mi> <mo>*</mo> </msubsup> <mo>-</mo> <msub> <mi>&alpha;</mi> <mi>ji</mi> </msub> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>ki</mi> </msub> <mo>&CenterDot;</mo> <msub> <mi>x</mi> <mi>ji</mi> </msub> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math>
wherein the Lagrange factor akiAnd aki *Satisfy the requirement of <math> <mrow> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </munderover> <mrow> <mo>(</mo> <msub> <mi>&alpha;</mi> <mi>ki</mi> </msub> <mo>-</mo> <msubsup> <mi>&alpha;</mi> <mi>ki</mi> <mo>*</mo> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <mn>0,0</mn> <mo>&le;</mo> <msub> <mi>&alpha;</mi> <mi>ki</mi> </msub> <mo>,</mo> <msubsup> <mi>&alpha;</mi> <mi>ki</mi> <mo>*</mo> </msubsup> <mo>&le;</mo> <mi>C</mi> <mo>,</mo> <mi>k</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mi>K</mi> <mo>,</mo> </mrow> </math> A typical quadratic programming problem is formed, using the KKT condition: alpha is alphaki=0→ykif(xki)≥1;0<αki<C→ykif(xki)=1;αi=C→ykif(xki) And (3) solving the quadratic programming problem by using a sequence minimum optimization algorithm with the steps of:
1) given an initial value of the Lagrangian factor, typically take alphaki=0;
2) Calculating KKT conditions for the training data, finding data points (x) that violate the KKT conditions1i,y1i) Corresponding Lagrange factor alpha1iIt is taken as one of two Lagrangian factors to be optimized;
3) finding satisfaction of max | f in training datai(x1i)–fi(x2i)+y2i-y1iData point of | (x)2i,y2i) Corresponding Lagrange factor as alpha2i. Lagrange factor alpha1iAnd alpha2iAfter the selection is finished, other Lagrange factors are kept unchanged to form a minimum-scale quadratic programming problem, namely, the optimal quadratic programming problem is solvedAnd
4) solving the minimum quadratic programming problem:
K 11 = x 1 i 2 ,
K 22 = x 2 i 2 ,
K 12 = x 1 i x 2 i ,
<math> <mrow> <msubsup> <mi>&alpha;</mi> <mrow> <mn>2</mn> <mi>i</mi> </mrow> <mi>new</mi> </msubsup> <mo>=</mo> <msubsup> <mi>&alpha;</mi> <mrow> <mn>2</mn> <mi>i</mi> </mrow> <mi>old</mi> </msubsup> <mo>-</mo> <mfrac> <mrow> <msub> <mi>y</mi> <mrow> <mn>2</mn> <mi>i</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>E</mi> <mn>1</mn> </msub> <mo>-</mo> <msub> <mi>E</mi> <mn>2</mn> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <mn>2</mn> <msub> <mi>K</mi> <mn>12</mn> </msub> <mo>-</mo> <msub> <mi>K</mi> <mn>11</mn> </msub> <mo>-</mo> <msub> <mi>K</mi> <mn>22</mn> </msub> </mrow> </mfrac> <mo>,</mo> </mrow> </math>
wherein Eki=fi old(xki)–ykiTraining errors;
when y is1i≠y2iWhen the temperature of the water is higher than the set temperature, <math> <mrow> <mi>L</mi> <mo>=</mo> <mi>max</mi> <mrow> <mo>(</mo> <mn>0</mn> <mo>,</mo> <msubsup> <mi>&alpha;</mi> <mrow> <mn>2</mn> <mi>i</mi> </mrow> <mi>old</mi> </msubsup> <mo>-</mo> <msubsup> <mi>&alpha;</mi> <mrow> <mn>1</mn> <mi>i</mi> </mrow> <mi>old</mi> </msubsup> <mo>)</mo> </mrow> <mo>,</mo> <mi>H</mi> <mo>=</mo> <mi>min</mi> <mrow> <mo>(</mo> <mi>C</mi> <mo>,</mo> <mi>C</mi> <mo>+</mo> <msubsup> <mi>&alpha;</mi> <mrow> <mn>2</mn> <mi>i</mi> </mrow> <mi>old</mi> </msubsup> <mo>-</mo> <msubsup> <mi>&alpha;</mi> <mrow> <mn>1</mn> <mi>i</mi> </mrow> <mi>old</mi> </msubsup> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math>
when y is1=y2When the temperature of the water is higher than the set temperature, <math> <mrow> <mi>L</mi> <mo>=</mo> <mi>max</mi> <mrow> <mo>(</mo> <mn>0</mn> <mo>,</mo> <msubsup> <mi>&alpha;</mi> <mrow> <mn>1</mn> <mi>i</mi> </mrow> <mi>old</mi> </msubsup> <mo>+</mo> <msubsup> <mi>&alpha;</mi> <mrow> <mn>2</mn> <mi>i</mi> </mrow> <mi>old</mi> </msubsup> <mo>-</mo> <mi>C</mi> <mo>)</mo> </mrow> <mo>,</mo> <mi>H</mi> <mo>=</mo> <mi>min</mi> <mrow> <mo>(</mo> <mi>C</mi> <mo>,</mo> <msubsup> <mi>&alpha;</mi> <mrow> <mn>1</mn> <mi>i</mi> </mrow> <mi>old</mi> </msubsup> <mo>+</mo> <msubsup> <mi>&alpha;</mi> <mrow> <mn>2</mn> <mi>i</mi> </mrow> <mi>old</mi> </msubsup> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math>
<math> <mrow> <msubsup> <mi>&alpha;</mi> <mrow> <mn>2</mn> <mi>i</mi> </mrow> <mrow> <mi>new</mi> <mo>,</mo> <mi>clipped</mi> </mrow> </msubsup> <mo>=</mo> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <mi>H</mi> <mo>,</mo> </mtd> <mtd> <msubsup> <mi>&alpha;</mi> <mrow> <mn>2</mn> <mi>i</mi> </mrow> <mi>new</mi> </msubsup> <mo>&GreaterEqual;</mo> <mi>H</mi> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>&alpha;</mi> <mrow> <mn>2</mn> <mi>i</mi> </mrow> <mi>new</mi> </msubsup> <mo>,</mo> </mtd> <mtd> <mi>L</mi> <mo>&lt;</mo> <msubsup> <mi>&alpha;</mi> <mrow> <mn>2</mn> <mi>i</mi> </mrow> <mi>new</mi> </msubsup> <mo>&lt;</mo> <mi>H</mi> </mtd> </mtr> <mtr> <mtd> <mi>L</mi> <mo>,</mo> </mtd> <mtd> <msubsup> <mi>&alpha;</mi> <mrow> <mn>2</mn> <mi>i</mi> </mrow> <mi>new</mi> </msubsup> <mo>&le;</mo> <mi>L</mi> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> </mrow> </math>
<math> <mrow> <msubsup> <mi>&alpha;</mi> <mrow> <mn>1</mn> <mi>i</mi> </mrow> <mi>new</mi> </msubsup> <mo>=</mo> <msubsup> <mi>&alpha;</mi> <mrow> <mn>1</mn> <mi>i</mi> </mrow> <mi>old</mi> </msubsup> <mo>+</mo> <msub> <mi>y</mi> <mrow> <mn>1</mn> <mi>i</mi> </mrow> </msub> <msub> <mi>y</mi> <mrow> <mn>2</mn> <mi>i</mi> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>&alpha;</mi> <mrow> <mn>2</mn> <mi>i</mi> </mrow> <mi>old</mi> </msubsup> <mo>-</mo> <msubsup> <mi>&alpha;</mi> <mrow> <mn>2</mn> <mi>i</mi> </mrow> <mrow> <mi>new</mi> <mo>,</mo> <mi>clipped</mi> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math>
obtaining a pair of new Lagrange factorsAnd
5) checking whether a data point violating the KKT condition exists, and if yes, returning to the step 2); otherwise, obtaining the optimal solution of the whole problem and carrying out the next step;
6) obtaining a regression function:
<math> <mrow> <msub> <mi>f</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>=</mo> <msup> <msub> <mi>w</mi> <mi>i</mi> </msub> <mi>T</mi> </msup> <mi>x</mi> <mo>+</mo> <msub> <mi>b</mi> <mi>i</mi> </msub> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </munderover> <mrow> <mo>(</mo> <msubsup> <mi>&alpha;</mi> <mi>ki</mi> <mo>*</mo> </msubsup> <mo>-</mo> <msub> <mi>&alpha;</mi> <mi>ki</mi> </msub> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>ki</mi> </msub> <mi>x</mi> <mo>)</mo> </mrow> <mo>+</mo> <msubsup> <mi>b</mi> <mi>i</mi> <mo>*</mo> </msubsup> <mo>,</mo> </mrow> </math>
wherein: <math> <mrow> <msubsup> <mi>b</mi> <mi>i</mi> <mo>*</mo> </msubsup> <mo>=</mo> <mfrac> <mn>1</mn> <mi>N</mi> </mfrac> <munder> <mi>&Sigma;</mi> <mrow> <mi>j</mi> <mo>&Element;</mo> <mo>{</mo> <mi>j</mi> <mo>|</mo> <msub> <mi>&alpha;</mi> <mi>ji</mi> </msub> <mo>></mo> <mn>0</mn> <mo>}</mo> </mrow> </munder> <mo>[</mo> <msub> <mi>y</mi> <mi>ji</mi> </msub> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </munderover> <msub> <mi>y</mi> <mi>ki</mi> </msub> <msub> <mi>&alpha;</mi> <mi>ki</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>ki</mi> </msub> <mo>&CenterDot;</mo> <msub> <mi>x</mi> <mi>ji</mi> </msub> <mo>)</mo> </mrow> <mo>]</mo> <mo>,</mo> </mrow> </math> n is the number of support vectors;
when transcoding is carried out, a mapping model established by a support vector regression algorithm is used for carrying out line spectrum to coefficient mapping according to a group of received line spectrum to coefficients to obtain a group of line spectrum to coefficients required by a B encoder, and the line spectrum to coefficients are used as unquantized line spectrum to coefficients of a B encoding standard;
converting the obtained unquantized line spectrum pair coefficient into a line spectrum frequency coefficient, carrying out quantization coding according to a B coding standard, and then sending the line spectrum frequency coefficient to a communication network 2;
according to the B coding standard of the communication network 2, interpolating a group of line spectrum pair coefficients obtained by mapping with the line spectrum pair coefficients of the previous frame or several frames to obtain unquantized line spectrum pair coefficients of each subframe, calculating the unquantized linear prediction coefficients A (z) of each subframe according to the unquantized line spectrum pair coefficients of each subframe, and performing the B coding standard on the group of line spectrum pair coefficients obtained by mappingQuantizing to obtain a group of quantized line spectrum pair coefficients of the current frame, interpolating the quantized line spectrum pair coefficients of the current frame with the quantized line spectrum pair coefficients of the previous frame or several frames to obtain the quantized line spectrum pair coefficients of each subframe, calculating the quantized linear prediction coefficients A '(z) of each subframe from the quantized line spectrum pair coefficients of each subframe, wherein the unquantized linear prediction coefficients A (z) and the quantized linear prediction coefficients A' (z) are respectively used for calculating a perceptual weighting filter W (z) = A (z/gamma) =1)/A(z/γ2) And the coefficients of the synthesis filter 1/A' (z), said gamma1And gamma2Is a perceptual weighting factor;
(b) open-loop pitch search:
in a speech coding algorithm based on code excited linear prediction, pitch search is done in two steps. The first step is open-loop pitch search, approximately estimating the pitch period, which is marked as T _ op, and aiming at providing a rough range for closed-loop pitch search to reduce the calculated amount of closed-loop pitch search, and the second step is closed-loop pitch search near the T _ op.
When transcoding is performed, the normal open-loop pitch search process is omitted, and the decoded pitch-delayed integer part T0 is used as the open-loop pitch search result T _ op for B-coding standard coding:
T_opB=T0A,
(c) the impulse response of the perceptual weighted synthesis filter and the target signal of the adaptive codebook search are calculated,
perceptual weighted synthesis filter h (z) = a (z/γ)1)/(A'(z)A(z/γ2) Impulse response h (n) for adaptive codebook and fixed codebook search, typically once per subframe. The impulse signal passes through a filter A (z/gamma)1) Then, the mixture is successively passed through 1/A' (z) and 1/A (z/gamma)2) Obtaining h (n);
the calculation process of the target signal x (n) of the adaptive codebook search is as follows: first, the residual signal res of the linear prediction filter is calculatedLP(n) the calculation formula is:
<math> <mrow> <msub> <mi>res</mi> <mi>LP</mi> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <msup> <mi>s</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>+</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>P</mi> </munderover> <msub> <mover> <mi>a</mi> <mo>^</mo> </mover> <mi>i</mi> </msub> <msup> <mi>s</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math>
where s' (n) is the reconstructed speech resulting from the decoding,for quantized linear prediction coefficients, P is the order of the linear prediction filter. Then the residual signal resLP(n) pass through a perceptually weighted synthesis filter H (z), resLP(n) convolving with h (n) to obtain a target signal x (n):
x(n)=resLP(n)*h(n);
(d) the adaptive codebook search is performed in a manner such that,
the adaptive codebook search includes closed loop pitch search and calculation of an adaptive codebook vector;
the criterion for closed-loop pitch search is to minimize the mean square error between the reconstructed speech at the decoding end and the reconstructed speech at the encoding end, even if r (k) is maximum:
<math> <mrow> <mi>R</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>len</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mi>x</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <msub> <mi>y</mi> <mi>k</mi> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> </mrow> <msqrt> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>len</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>y</mi> <mi>k</mi> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <msub> <mi>y</mi> <mi>k</mi> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> </msqrt> </mfrac> <mo>,</mo> </mrow> </math>
wherein x (n) is a target signal, yk(n) is the past filtered excitation at delay k, i.e. the convolution of the past excitation with h (n), len is the sub-frame length;
when the closed-loop pitch search is performed, the search range is limited to be around a preselected value T _ op, and the range of the closed-loop pitch search is determined according to the value of the integer pitch delay T0 obtained by decoding:
[T0-g1(T0),T0+g2(T0)],
wherein, g1、g2Respectively, as a function of T0;
and carrying out closed-loop pitch search within a limited range to obtain the optimal integer pitch delay k, and if the resolution of k is within the range of fractional delay according to the coding standard of a receiving end, testing the fraction near the optimal integer delay. Interpolating the normalized correlation coefficient R (k) and searching the maximum value thereof to obtain a fractional pitch period;
<math> <mrow> <mi>R</mi> <msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mi>t</mi> </msub> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>&epsiv;</mi> </munderover> <mi>R</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>-</mo> <mi>i</mi> <mo>)</mo> </mrow> <msub> <mi>b</mi> <mi>m</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>-</mo> <mi>i</mi> <mo>&CenterDot;</mo> <mi>&epsiv;</mi> <mo>)</mo> </mrow> <mo>+</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>&epsiv;</mi> </munderover> <mi>R</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>+</mo> <mn>1</mn> <mo>+</mo> <mi>i</mi> <mo>)</mo> </mrow> <msub> <mi>b</mi> <mi>m</mi> </msub> <mrow> <mo>(</mo> <mi>&epsiv;</mi> <mo>-</mo> <mi>t</mi> <mo>+</mo> <mi>i</mi> <mo>&CenterDot;</mo> <mi>&epsiv;</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math>
where t =0,1, …, -1, b is the reciprocal of the fractional delay resolutionmAfter pitch lag determination, interpolating the past excitation u (n) at a given integer delay k and fractional delay t to compute an adaptive codebook vector:
<math> <mrow> <mi>v</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>P</mi> </munderover> <mi>u</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mi>k</mi> <mo>+</mo> <mi>i</mi> <mo>)</mo> </mrow> <msub> <mi>b</mi> <mi>q</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>+</mo> <mi>i</mi> <mo>&CenterDot;</mo> <mi>&epsiv;</mi> <mo>)</mo> </mrow> <mo>+</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>P</mi> </munderover> <mi>u</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mi>k</mi> <mo>+</mo> <mn>1</mn> <mo>+</mo> <mi>i</mi> <mo>)</mo> </mrow> <msub> <mi>b</mi> <mi>q</mi> </msub> <mrow> <mo>(</mo> <mi>&epsiv;</mi> <mo>-</mo> <mi>t</mi> <mo>-</mo> <mi>i</mi> <mo>&CenterDot;</mo> <mi>&epsiv;</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math>
after the adaptive codebook is determined, the gain g of the adaptive codebook may be calculatedp
<math> <mrow> <msub> <mi>g</mi> <mi>p</mi> </msub> <mo>=</mo> <mfrac> <mrow> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>len</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mi>x</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mi>y</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> </mrow> <mrow> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>len</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mi>y</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mi>y</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>,</mo> </mrow> </math>
Where len is the sub-frame length, x (n) is the target signal of adaptive codebook search, and the convolution y (n) of v (n) and h (n) is the adaptive codebook vector filtering signal, i.e. y (n) = v (n) × h (n), where h (n) is the impulse response of the perceptual weighted synthesis filter h (z);
(e) fixed codebook search
The fixed codebook vector can be expressed as:
<math> <mrow> <mi>c</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>S</mi> <mn>1</mn> </msub> <mi>&delta;</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <msub> <mi>m</mi> <mn>1</mn> </msub> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>S</mi> <mn>2</mn> </msub> <mi>&delta;</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <msub> <mi>m</mi> <mn>2</mn> </msub> <mo>)</mo> </mrow> <mo>+</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>+</mo> <msub> <mi>S</mi> <msub> <mi>N</mi> <mi>p</mi> </msub> </msub> <mi>&delta;</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <msub> <mi>m</mi> <msub> <mi>N</mi> <mi>p</mi> </msub> </msub> <mo>)</mo> </mrow> <mo>,</mo> <mi>n</mi> <mo>=</mo> <mn>0,1</mn> <mo>,</mo> <mi>&Lambda;</mi> <mo>,</mo> <mi>len</mi> <mo>-</mo> <mn>1</mn> <mo>;</mo> </mrow> </math>
wherein (N) is a unit pulse, NpThe number of non-zero pulses in the fixed codebook vector, len is the subframe length. m is1,m2,...,Indicating the position of a non-zero pulse, S1,S2,...,The symbol (1 or-1) representing the non-zero pulse at the corresponding position, c (N) being a len-dimensional vector, divided by NpExcept for the non-zero pulse, other elements are all 0;
fixed codebook search is with the reconstructed speech s 'of the decoding side weighted'w(n) and the weighted mean square error minimization criterion between the reconstructed speech at the encoding end to search the fixed codebook vector, i.e. to determine the position and sign of the non-zero pulse in the codebook vector;
the fixed codebook search is carried out by firstly calculating target signal
x2(n)=x(n)-gpy(n),n=0,1,Λ,len-1;
Wherein x (n) is the target signal of adaptive codebook search, y (n) = v (n) × h (n) is the adaptive codebook vector filtering signal, gpFor adaptive codebook gain, len subframe length;
if c is a codebook vector, then the codebook vector that maximizes the following equation is obtained:
wherein d is x2(n) and the impulse response h (n) of the perceptually weighted synthesis filter,is the autocorrelation matrix of h (n), T represents the matrix transposition;
the elements of vector d are calculated as follows:
<math> <mrow> <mi>d</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>len</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>x</mi> <mn>2</mn> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mi>h</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>-</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>,</mo> <mi>n</mi> <mo>=</mo> <mn>0,1</mn> <mo>,</mo> <mi>&Lambda;</mi> <mo>,</mo> <mi>len</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> </mrow> </math>
where len is the subframe length. The elements of the symmetric matrix are calculated as follows:
the term of the molecule concerned in formula (1) can be represented by the following formula:
<math> <mrow> <mi>C</mi> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <msub> <mi>N</mi> <mi>p</mi> </msub> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>S</mi> <mi>i</mi> </msub> <mi>d</mi> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math>
wherein u isiIndicating the position of the ith pulse, SiSymbol representing the ith pulse, NpThe number of non-zero pulses in the fixed codebook vector; the denominator in formula (1) is given by:
<math> <mrow> <msub> <mi>E</mi> <mi>D</mi> </msub> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <msub> <mi>N</mi> <mi>p</mi> </msub> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mi>&phi;</mi> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>u</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <mn>2</mn> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <msub> <mi>N</mi> <mi>p</mi> </msub> <mo>-</mo> <mn>2</mn> </mrow> </munderover> <munderover> <mi>&Sigma;</mi> <mrow> <mi>j</mi> <mo>=</mo> <mi>i</mi> <mo>+</mo> <mn>1</mn> </mrow> <mrow> <msub> <mi>N</mi> <mi>p</mi> </msub> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>S</mi> <mi>i</mi> </msub> <msub> <mi>S</mi> <mi>j</mi> </msub> <mi>&phi;</mi> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>u</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math>
u maximizing formula (1)1,u2,...The calculated non-zero pulse position is obtained;
position m of fixed codebook vector obtainable by decoding at transcoding1,m2,Λ,Limiting the search range of fixed codebook search according to the coding standard of the receiving end to make the search of fixed codebook in m1,m2,Λ,Performing a simplified search near the location;
searching the codebook with fixed codebook gain with minimum mean square weighted error between the reconstructed speech decoded by the a-decoder and the reconstructed speech encoded by the B-encoder, even if the following equation is minimum:
E = | | x - g p y - g c z | | 2 = x T x + g p 2 y T y + g c 2 z T z - 2 g p x T y - 2 g c x T z + 2 g p g c y T z ,
where x is the target vector of the fixed codebook search, y is the adaptive codebook vector filtered signal, and z is the convolution of the fixed codebook vector with h (n):
<math> <mrow> <mi>z</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>n</mi> </munderover> <mi>c</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <mi>h</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>,</mo> <mi>n</mi> <mo>=</mo> <mn>0</mn> <mo>,</mo> <mi>&Lambda;</mi> <mo>,</mo> <mi>len</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> </mrow> </math>
wherein len is the subframe length;
(2) if the received frame type is a silence insertion description frame, the transcoding process is as follows:
(a) interpolating the quantized line spectrum pair coefficients of the current frame and the last silence insertion description frame obtained by decoding as the unquantized line spectrum pair coefficients of the corresponding frame at the encoding end of the B encoding standard:
<math> <mrow> <msubsup> <mi>LSP</mi> <mi>B</mi> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </msubsup> <mo>[</mo> <mi>i</mi> <mo>]</mo> <mo>=</mo> <mi>&alpha;</mi> <msubsup> <mi>LSP</mi> <mi>A</mi> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </msubsup> <mo>[</mo> <mi>i</mi> <mo>]</mo> <mo>+</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>&alpha;</mi> <mo>)</mo> </mrow> <msubsup> <mi>LSP</mi> <mi>A</mi> <mrow> <mo>(</mo> <mn>0</mn> <mo>)</mo> </mrow> </msubsup> <mo>[</mo> <mi>i</mi> <mo>]</mo> <mo>,</mo> <mi>i</mi> <mo>=</mo> <mn>0,1</mn> <mo>,</mo> <mi>&Lambda;</mi> <mo>,</mo> <mi>n</mi> <mo>,</mo> </mrow> </math>
wherein, LSP(1)、LSP(0)Respectively representing the line spectrum pair coefficients of the current frame and the last silence insertion description frame, wherein n is the dimension of the line spectrum pair coefficients, and alpha is an interpolation coefficient;
(b) converting the decoded energy parameter ener into an energy parameter of a corresponding frame at a coding end of the B coding standard:
enerB=a·enerA+b,
wherein a and b are linear fitting coefficients.
5. The method of claim 1, wherein the transcoding method comprises: the coding unit is used for carrying out quantization coding on the obtained parameters, and the specific steps are as follows:
(1) if the current frame is a voice frame, the parameters comprise a line spectrum pair coefficient, a pitch delay, a non-zero pulse position and symbol of a fixed codebook, a self-adaptive codebook gain and a fixed codebook gain, and each parameter is quantized and coded according to the coding standard of the communication network 2 to obtain a parameter bit;
(2) if the current frame is a silence insertion description frame, the parameters are line spectrum pair coefficients and voice energy, and the parameters are quantized and coded according to the coding standard of the communication network 2 to obtain parameter bits.
6. The method of claim 1, wherein the transcoding method comprises: the bit stream encapsulation unit is used for packing and outputting the parameter bit, the mode information and the frame type, wherein the output frame type is assigned according to the received frame type, so that the input frame type and the output frame type are the same, namely the received data frame is a voice frame, and the output frame type is also the voice frame; if the received data frame is a mute frame, the output frame type is also a mute frame, and the frame type is not judged according to the reconstructed voice.
CN201310598532.5A 2013-11-20 2013-11-20 Transcoding method for code stream of voice coder Pending CN104658539A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310598532.5A CN104658539A (en) 2013-11-20 2013-11-20 Transcoding method for code stream of voice coder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310598532.5A CN104658539A (en) 2013-11-20 2013-11-20 Transcoding method for code stream of voice coder

Publications (1)

Publication Number Publication Date
CN104658539A true CN104658539A (en) 2015-05-27

Family

ID=53249579

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310598532.5A Pending CN104658539A (en) 2013-11-20 2013-11-20 Transcoding method for code stream of voice coder

Country Status (1)

Country Link
CN (1) CN104658539A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022179406A1 (en) * 2021-02-26 2022-09-01 腾讯科技(深圳)有限公司 Audio transcoding method and apparatus, audio transcoder, device, and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022179406A1 (en) * 2021-02-26 2022-09-01 腾讯科技(深圳)有限公司 Audio transcoding method and apparatus, audio transcoder, device, and storage medium

Similar Documents

Publication Publication Date Title
USRE49363E1 (en) Variable bit rate LPC filter quantizing and inverse quantizing device and method
CN103236262B (en) A kind of code-transferring method of speech coder code stream
JP4611424B2 (en) Method and apparatus for encoding an information signal using pitch delay curve adjustment
US11721349B2 (en) Methods, encoder and decoder for linear predictive encoding and decoding of sound signals upon transition between frames having different sampling rates
US7433815B2 (en) Method and apparatus for voice transcoding between variable rate coders
KR100603167B1 (en) Synthesis of speech from pitch prototype waveforms by time-synchronous waveform interpolation
KR20070038041A (en) Method and apparatus for voice trans-rating in multi-rate voice coders for telecommunications
KR20040028750A (en) Method and system for line spectral frequency vector quantization in speech codec
KR20160039297A (en) Apparatus and method for encoding and decoding an audio signal using an aligned look-ahead portion
EP1597721B1 (en) 600 bps mixed excitation linear prediction transcoding
JP2002544551A (en) Multipulse interpolation coding of transition speech frames
KR20040045586A (en) Apparatus and method for transcoding between CELP type codecs with a different bandwidths
CN104658539A (en) Transcoding method for code stream of voice coder
CN101582263A (en) Method and device for noise enhancement post-processing in speech decoding
CN101266798B (en) A method and device for gain smoothing in voice decoder
Pankaj A novel transcoding scheme from EVRC to G. 729ab
Lin et al. AN EFFICIENT TRANSCODING SCHEME FOR G. 729 AND G. 723.1 SPEECH CODECS: INTEROPERABILITY OVER THE INTERNET
Shikui et al. Speech transcoding from AMR to G. 729 in excitation domain
KR19980031894A (en) Quantization of Line Spectral Pair Coefficients in Speech Coding

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150527

WD01 Invention patent application deemed withdrawn after publication