CN101118748A - Search method, device and speech coder for algebraic codebook - Google Patents
Search method, device and speech coder for algebraic codebook Download PDFInfo
- Publication number
- CN101118748A CN101118748A CNA2006100620071A CN200610062007A CN101118748A CN 101118748 A CN101118748 A CN 101118748A CN A2006100620071 A CNA2006100620071 A CN A2006100620071A CN 200610062007 A CN200610062007 A CN 200610062007A CN 101118748 A CN101118748 A CN 101118748A
- Authority
- CN
- China
- Prior art keywords
- path
- track
- index
- optimal
- algebraic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 230000015572 biosynthetic process Effects 0.000 claims description 32
- 238000003786 synthesis reaction Methods 0.000 claims description 32
- 230000005284 excitation Effects 0.000 claims description 22
- 125000004122 cyclic group Chemical group 0.000 claims description 15
- 238000000354 decomposition reaction Methods 0.000 claims description 10
- 239000011159 matrix material Substances 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 3
- 238000010845 search algorithm Methods 0.000 description 9
- 230000003044 adaptive effect Effects 0.000 description 7
- 238000012360 testing method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 229910052739 hydrogen Inorganic materials 0.000 description 2
- 239000001257 hydrogen Substances 0.000 description 2
- 125000004435 hydrogen atom Chemical class [H]* 0.000 description 2
- 241001522296 Erithacus rubecula Species 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000007599 discharging Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
Images
Abstract
The present invention is suitably applied to the field of speech encoding; and provides a search method, a device and an encoder for an algebraic codebook. Each algebraic codebook comprises a plurality of subsidiary codebooks, and each subsidiary codebook comprises a plurality of tracks, each track is provided with one or a plurality of places for the possible appearance of non-zero pulse. The method decomposes a plurality of tracks contained in each codebook into a plurality of channels, and the optimal reference of every channel is determined according to the position of non-zero pulse in the channel. A fixed channel and a dynamic channel are selected and combined to cyclically search for corresponding track preference. The present invention decomposes the tracks of subsidiary codebooks into a plurality of channels, and searches for the ultimate codebook preference through the combination of the fixed channel and the dynamic channel; and effectively reduces the sophistication of realizing codebook search and improving the application value of coding algorithm.
Description
Technical Field
The present invention belongs to the field of speech coding, and in particular, to a method and apparatus for searching an algebraic codebook in speech coding, and a speech coder.
Background
Speech coding is an algorithm that compresses the digital representation of speech signals to minimize the bit requirements needed to represent these signals, and is currently largely classified into three categories, waveform coding, parametric coding, and hybrid coding. Waveform coding has the advantages of strong adaptability, high voice quality and the like, but the required bit rate is higher. The bit rate of parametric coding is generally low, but the coding quality is poor, and the naturalness of the synthesized speech is low. The mixed coding overcomes the defects of wave form coding and parameter coding, combines the advantages of the wave form coding and the parameter coding, and can obtain high-quality synthesized voice at the rate of 4-16 kb/s. Algebraic Code-excited Linear Prediction (ACELP) speech coding, as a hybrid speech coding method, has found application in many speech coding standards of the International Telecommunications Union (ITU), such as g.722.2, g.723.1, g.729, etc.
The random excitation signal is a main component of a Code-excited Linear Prediction (CELP) speech coding model. The ACELP speech coding algorithm uses a random codevector containing a number of non-zero pulses to describe the random excitation signal. The amplitude of the non-zero pulses can only be +1 or-1, the position of which is described by an algebraic codebook. The algebraic codebook stores the positions where non-zero pulses of the random excitation vector may occur. When the coding rate is less than 8kb/s, ACELP speech coding algorithms typically employ an algebraic multiple subcodebook structure, each subcodebook differing in the position in the codebook where non-zero pulses may occur. The combination of the trade-off between the number of bits and the coding quality, each algebraic subcodebook usually comprises 4 tracks, each track storing the position where one or several non-zero pulses may occur. The specific form of a sub-codebook is shown in the following table, where each sub-codebook has 2 modes, and occupies 1 bit, and each sub-codebook contains 4 tracks, i.e. the positions where m0, m1, m2, and m3 may appear in the table, and under low-rate conditions, each track generally occupies 3-4 bits, i.e. 8 or 16 candidate positions.
Subcodebook | Pulse sign and position | ||||
m0 | m1 | | m3 | ||
Subcodebook | |||||
1 | Mode 0 | (+)0,6,16,26, 36,46,56,66 | (-)1,8,18,28, 38,48,58,69 | (+)2,10,20,30, 40,50,60,72 | (-)3,12,22,32,42,52,62,75 4,14,24,34,44,54,64,78 |
Mode 1 | (-)0,7,17,27, 37,47,57,67 | (+)1,9,19,29, 39,49,59,70 | (-)2,11,21,31, 41,51,61,73 | (+)3,13,23,33,43,53,63,76 5,15,25,35,45,55,65,79 | |
|
Mode 0 | (+)0,5,10,15, 20,26,36,46 | (-)1,6,11,16, 21,28,38,48 | (+)2,7,12,17, 22,30,40,50 | (-)3,8,13,18,23,32,42,52 4,9,14,19,24,34,44,54 |
Mode 1 | (-)0,5,10,15, 20,25,35,45 | (+)1,6,11,16, 21,27,37,47 | (-)2,7,12,17, 22,29,39,49 | (+)3,8,13,18,23,31,41,51 4,9,14,19,24,33,43,53 | |
Subcodebook 3 | Mode 0 | (+)0,5,10,15, 20,25,30,35 | (-)1,6,11,16, 21,26,31,36 | (+)2,7,12,17, 22,27,32,37 | (-)3,8,13,18,23,28,33,38 4,9,14,19,24,29,34,39 |
Mode 1 | (-)0,5,10,15, 20,25,30,35 | (+)1,6,11,16, 21,26,31,36 | (-)2,7,12,17, 22,27,32,37 | (+)3,8,13,18,23,28,33,38 4,9,14,19,24,29,34,39 |
In most ACELP speech coding algorithms, the algebraic codebook mostly uses a full search algorithm, i.e. traverses all position combinations in 4 tracks, so that the synthesized ones are synthesized by random codevectorsThe speech quality is highest. As shown in FIG. 1, in coding, a certain sub-codebook is selected by a pitch period, each codebook comprises 4 tracks, the positions of four pulses in the 4 tracks are circularly controlled, a random excitation vector is obtained by adding the four pulses, the random excitation vector is input to a perceptual weighting filter to obtain a perceptually weighted synthesized speech signal, the four pulse positions are selected according to the principle of minimizing the weighted error between a residual target vector and the synthesized speech signal, namely, the four pulse positions can be obtained by the following formula e at the maximum, and a certain algebraic sub-codebook index i selected by the pitch period is obtained m :
Wherein r (i), i =0,1, \8230, N c -1 is a residual target vector x' 0 Convolution with perceptually weighted synthesis filter impulse response h (n), residual target vector x' 0 Subtracting the zero-input response and adaptive codebook contribution of the perceptually weighted synthesis filter from the weighted input speech, _ (i, j), i, j =0,1, \ 8230;, N c -1 is a covariance matrix of h (n); n is a radical of c The number of non-zero pulses in the random excitation vector is set; m is a unit of i The positions where non-zero pulses may occur; s i The pulse amplitude, here, can only be +1 or-1.
The expression for the perceptual weighted synthesis filter is as follows:
wherein alpha is i Is a linear prediction coefficient, gamma is a stretching factor, typically between 0.8 and 0.9, and p is a filter order.
Because the time complexity of the algebraic codebook full search and the bit number occupied by the algebraic codebook form an exponential relation, when the bit number occupied by the algebraic codebook increases, the complexity of the algebraic codebook full search algorithm can rapidly increase and the speech coding algorithm is difficult to realize in real time, thereby influencing the application value of the algorithm.
Disclosure of Invention
The invention aims to provide a searching method of an algebraic codebook, and aims to solve the problems that the time complexity of algebraic codebook full search in the prior art is high, and the number of bits occupied by the algebraic codebook is increased, so that a voice coding algorithm is difficult to realize in real time, and the application value of the algorithm is influenced.
Another object of the present invention is to provide an algebraic codebook searching device.
It is another object of the present invention to provide a speech coder.
The invention is realized by a searching method of an algebraic codebook, wherein each algebraic codebook comprises a plurality of subcodebooks, each subcodebook comprises a plurality of tracks, and each track stores positions where one or more nonzero pulses can appear, the method comprises the following steps:
A. decomposing a plurality of tracks contained in each sub-codebook into a plurality of paths, wherein the plurality of paths form a path group to be searched, and each path contains a plurality of possible positions where non-zero pulses can appear;
B. determining the optimal index of each path in the path group to be searched according to the track position of the nonzero pulse in the path;
C. and selecting a fixed path and at least one dynamic path from the path group to be searched, performing combined cyclic search on all the fixed paths and the dynamic paths to obtain an optimal algebraic subcodebook index corresponding to the dynamic path, and converting the searched optimal algebraic subcodebook index of the fixed path and the searched optimal algebraic subcodebook index corresponding to the dynamic path into a corresponding track index.
The optimal index of the path is a path index which meets the following conditions:
wherein, I x For the optimal index of path x, p (j) is the track position corresponding to index j, and r '(n) is residual target vector x' 0 Convolution with perceptually weighted synthesis filter impulse response h (n), s x Is the non-zero pulse amplitude for path x, either +1 or-1.
The optimal algebraic subcodebook index corresponding to the dynamic path is an algebraic codebook index i which maximizes the following expression e in the cyclic search process of all the fixed paths and dynamic path combinations m :
Wherein r' (i), i =0,1, \8230, N c -1 is a residual target vector x' 0 Convolution with perceptually weighted synthesis filter impulse response h (N) _ (i, j), i, j =0,1, \ 8230N c -1 is a covariance matrix of h (N), N c Is the number of non-zero pulses in the random excitation vector, m i For positions in the dynamic path where non-zero pulses may occur, s i Is the pulse amplitude.
When the tracks are decomposed, one track is decomposed into one or two paths according to the length of the track.
When a track is divided into two paths, the even term position in the track is a path, and the odd term position in the track is a path.
When the searched optimal index of the fixed path and the optimal algebraic subcodebook index corresponding to the dynamic path are converted into corresponding track indexes, when one track only contains 1 path, the path index is the track index;
when a track has 2 paths, the path index is converted to the track index by:
wherein, c i Is an index of track I, I i For the index of path i, the a-path is the even entry of the track and the b-path is the odd entry of the track.
An apparatus for searching an algebraic codebook, wherein each algebraic codebook comprises a plurality of subcodebooks, each subcodebook comprises a plurality of tracks, and each track stores locations where one or more non-zero pulses may occur, the apparatus comprising:
the track path decomposition module is used for decomposing a plurality of tracks contained in each sub-codebook into a plurality of paths, the plurality of paths form a path group to be searched, and each path contains a plurality of possible positions where nonzero pulses can appear;
the optimal index searching module is used for determining the optimal index of each path in the path group to be searched according to the track position of the nonzero pulse in the path; and
and the path cyclic search module is used for selecting a fixed path and at least one dynamic path from the path group to be searched, performing combined cyclic search on all the fixed paths and the dynamic paths to obtain the optimal algebraic subcodebook index corresponding to the dynamic path, and converting the searched optimal index of the fixed path and the optimal subcodebook index corresponding to the dynamic path into corresponding track indexes.
The optimal index of the path is a path index satisfying the following conditions:
wherein, I x For the optimal index of path x, p (j) is the track position corresponding to index j, and r '(n) is the residual target vector x' 0 Convolution with perceptually weighted synthesis filter impulse response h (n), s x Is the non-zero pulse amplitude for path x, either +1 or-1.
The dynamic path pairThe optimal algebraic subcodebook index is the algebraic codebook index i that maximizes the following e in the cyclic search of all fixed and dynamic path combinations m :
Wherein r' (i), i =0,1, \8230, N c -1 is a residual target vector x' 0 Convolution with perceptually weighted synthesis filter impulse response h (N) _ (i, j) i, j =0,1, \8230;, N c -1 is a covariance matrix of h (N), N c Is the number of non-zero pulses in the random excitation vector, m i For positions in the dynamic path where non-zero pulses may occur, s i Is the pulse amplitude.
When the track is decomposed, one track is decomposed into one or two paths according to the length of the track.
When a track is divided into two paths, the even term position in the track is a path, and the odd term position in the track is a path.
When the searched optimal index of the fixed path and the optimal algebraic subcodebook index corresponding to the dynamic path are converted into corresponding track indexes, when one track only contains 1 path, the path index is the track index;
when a track has 2 paths, the path index is converted into a track index by the following equation:
wherein, c i Index of track I, I i For the index of path i, the a-path is the even entry of the track and the b-path is the odd entry of the track.
A speech encoder comprising:
a codebook searching means for searching a pulse position in a codebook at which a weighted error between a residual target vector and a synthesized speech signal is minimized;
a first adder for adding pulses corresponding to the positions of the nonzero pulses searched by the codebook searching device and outputting a random excitation vector;
a perceptual weighting synthesis filter for performing perceptual weighting synthesis processing on the random excitation vector and outputting a perceptually weighted synthesized speech signal; and
a second adder for subtracting the residual target vector from the synthesized speech signal and outputting a weighted error signal;
each algebraic codebook comprises a plurality of subcodebooks, each subcodebook comprises a plurality of tracks, and each track stores the position where one or more nonzero pulses possibly appear;
characterized in that, the codebook searching device comprises:
the track path decomposition module is used for decomposing a plurality of tracks contained in each sub-codebook into a plurality of paths, the plurality of paths form a path group to be searched, and each path contains a plurality of possible positions where nonzero pulses can appear;
the optimal index searching module is used for determining the optimal index of each path in the path group to be searched according to the track position of the non-zero pulse in the path; and
and the path cyclic search module is used for selecting a fixed path and at least one dynamic path from the path group to be searched, performing combined cyclic search on all the fixed paths and the dynamic paths to obtain the optimal algebraic subcodebook index corresponding to the dynamic path, and converting the searched optimal index of the fixed path and the optimal subcodebook index corresponding to the dynamic path into corresponding track indexes.
The optimal index of the path is a path index satisfying the following conditions:
wherein, I x For the optimal index of path x, p (j) is the track position corresponding to index j, and r '(n) is residual target vector x' 0 Convolution with perceptually weighted synthesis filter impulse response h (n), s x Is the non-zero pulse amplitude for path x.
The optimal algebraic subcodebook index corresponding to the dynamic path is an algebraic codebook index i which maximizes the following expression e in the cyclic search process of all the fixed paths and dynamic path combinations m :
Wherein r' (i), i =0,1, \ 8230;, N c -1 is a residual target vector x' 0 Convolution with perceptually weighted synthesis filter impulse response h (N) _ (i, j), i, j =0,1, \8230;, N c -1 is a covariance matrix of h (N), N c Is the number of non-zero pulses in the random excitation vector, m i For positions in the dynamic path where non-zero pulses may occur, s i Is the pulse amplitude.
When the track is decomposed, one track is decomposed into one or two paths according to the length of the track.
When a track is divided into two paths, the even term position in the track is a path, and the odd term position in the track is a path.
When the searched optimal index of the fixed path and the optimal algebraic subcodebook index corresponding to the dynamic path are converted into corresponding track indexes, when one track only contains 1 path, the path index is the track index;
when a track has 2 paths, the path index is converted to the track index by:
wherein, c i Is an index of track I, I i The a-path is the even term of the track and the b-path is the odd term of the track, which is the index of path i.
The invention decomposes the track of the algebraic subcodebook into a plurality of paths, and searches the final codebook index by combining the fixed path and the dynamic path, thereby effectively reducing the complexity of realizing codebook search and improving the application value of the coding algorithm while ensuring the coding quality.
Drawings
FIG. 1 is a schematic diagram of codebook search in speech coding provided in the prior art;
FIG. 2 is a schematic diagram of the implementation of codebook search in speech coding provided by the present invention;
fig. 3 is a block diagram of a codebook searching apparatus in a speech coder according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
The invention decomposes the algebraic subcodebook track into a plurality of paths, and searches the final codebook index through the combination of the fixed path and the dynamic path, thereby effectively reducing the complexity of realizing codebook search while ensuring the coding quality.
In speech coding, an input speech signal is subjected to preprocessing and linear predictive analysis to obtain a perceptually weighted synthesis filter and a target vector x 0 To x 0 Obtaining a residual target vector X 'after adaptive codebook search' 0 :
x′ 0 (n)=X 0 (n)-g a x u (n)
In the formula, g a For adaptive codebook gain, x 0 (n) is the signal of the weighted input speech minus the zero input response of the perceptually weighted synthesis filter, x u (n) is the zero state response of the adaptive codebook to the perceptually weighted synthesis filter. The algebraic codebook is divided into a plurality of subcodebooks, the selection of which is determined by the pitch period of each subframe, without the need to transmit additional information. According to the experimental results, the number of subcodebooks in the 4kb/sACELP speech coding algorithm is usually 3.
As shown in fig. 2, in the present invention, when encoding speech, one of the 3 sub-codebooks is first selected as a codebook to be searched through a pitch period. In order to improve the search accuracy, the invention divides 4 tracks in the codebook into several paths, and each path comprises 8 possible positions of non-zero pulse. The more the number of paths of the track decomposition, the higher the algorithm complexity, and as an embodiment of the present invention, when the non-zero pulse position in the track is represented by 3 bits (the track length is 8), one track is a path. When the non-zero pulse position in a track is represented by 4 bits (track length 16), one track is decomposed into 2 paths. As a preferred embodiment of the present invention, the specific decomposition method of the path is: the a-path of track i is the even term of track i and the b-path of track i is the odd term of track i. For example, when the number of bits allocated to each track is 3, 4, and 4 (i.e., the track lengths are 8, 16, and 16, respectively), the 4 tracks are decomposed into 7 paths, i.e., path 0, path 1a, path 1b, path 2a, path 2b, path 3a, and path 3b, and the set of all these paths constitutes a path group to be searched. The path decomposition method can make each path have good track covering effect, and of course, there may be other decomposition methods, for example, the first 8 positions and the last 8 positions in the track are divided into one path respectively, but each path cannot cover the whole track completely, which affects the encoding quality.
After the path group to be searched is determined, respectively pre-searching each path to find the optimal index of each path, wherein the optimal index of each path is I meeting the following formula x :
In the formula I x An optimal index for path x; p (j) is the track position corresponding to index j; r '(n) is a residual target vector x' 0 Convolution with the perceptually weighted synthesis filter impulse response h (n); s x Is the non-zero pulse amplitude for path x, either +1 or-1.
The final codebook search is performed after the optimal index of each path is determined. Codebook search is done by round robin control. As an embodiment of the present invention, in each loop, 3 fixed paths and 1 dynamic path are selected from the group of paths to be searched, and if 2 paths belong to the same track, only 1 path can be selected.
In implementation, the complexity of the algorithm will increase rapidly with the increase of dynamic paths, but the coding quality will not be improved significantly, so the number of dynamic paths selected from the path group to be searched cannot be too large.
During searching, the path index of the fixed path uses the optimal index of each path obtained by pre-searching, and the index of the fixed path is fixed in the searching process. For 3 fixed paths (with the optimal index being I respectively) a 、I b 、I c ) Computing residual target vector x 'using its best index' 0 Covariance matrix of convolution r '(i) with perceptually weighted synthesis filter impulse response h (n) and h (n) _' (ij):
i=0,1,…,N-1;j=i,…,N-1
wherein N is the subframe length, and h (N) is the sense additionThe impulse response of the weight synthesis filter, where I, j ∈ { I ∈ [ ] a ,I b ,I c }。
For a dynamic path, r '(i) and _' (i, j) are computed over each of its selectable positions, and the algebraic codebook index i that maximizes the following e is found m :
Wherein r (i), i =0,1, \ 8230;, N c -1 is a residual target vector x' 0 Convolution with a perceptually weighted synthesis filter impulse response h (n); (i, j), i, j =0,1, \8230;, N c -1 is a covariance matrix of h (n); n is a radical of hydrogen c The number of non-zero pulses in the random excitation vector; m is a unit of i Positions where non-zero pulses may occur; s i For pulse amplitudes, this can only be +1 or-1. If the maximum value e max Greater than the maximum value E occurring in the previous path loop search max Then use e max In place of E max And record { I } a ,I b ,I c ,i m }. After searching all possible fixed and dynamic path combinations, the last recorded path index is indexed I a ,I b ,I c ,i m Convert to track index c 0 ,c 1 ,c 2 ,c 3 And completing the fixed codebook search. When a track contains only 1 path at the time of switching, the path index is the track index. When a track contains only 2 paths, the path index is converted into a track index by the following equation:
in the formula, c i An index for track i; i is i Is an index to path i.
After fixed codebook search is completed, adding the four searched pulses to obtain a random excitation vector, inputting the random excitation vector into a perceptual weighting filter to obtain a perceptually weighted synthesized speech signal, selecting four pulse positions according to the principle of minimizing the weighted error of a residual target vector and the synthesized speech signal, namely obtaining the four pulse positions by the following formula e at the maximum, and obtaining a certain algebraic subcodebook index i selected by a pitch period:
wherein r (i), i =0,1, \8230, N c -1 is a residual target vector x' 0 Convolution with perceptually weighted synthesis filter impulse response h (n), residual target vector x' 0 Subtracting the zero-input response and adaptive codebook contribution of the perceptually weighted synthesis filter from the weighted input speech, _ (i, j), i, j =0,1, \ 8230;, N c -1 is a covariance matrix of h (n); n is a radical of hydrogen c The number of non-zero pulses in the random excitation vector; m is i The positions where non-zero pulses may occur; s is i Pulse amplitude is +1 or-1.
Fig. 3 shows a structure of a codebook searching apparatus in a speech coder according to the present invention, which searches for a pulse position in a codebook that minimizes a weighted error between a residual target vector and a synthesized speech signal by a codebook searching apparatus 12, wherein:
the track path decomposition module 122 decomposes the plurality of tracks included in each sub-codebook into a plurality of paths, the plurality of paths form a path group to be searched, each path includes a position where a plurality of non-zero pulses may appear, the optimal index search module 124 determines an optimal index of each path in the path group to be searched according to the track position of the non-zero pulses in the path, the path loop search module 126 selects a fixed path and at least one dynamic path from the path group to be searched, performs combined loop search on all the fixed paths and the dynamic paths, obtains an optimal algebraic sub-codebook index corresponding to the dynamic path, and converts the searched optimal index of the fixed path and the searched optimal algebraic sub-codebook index corresponding to the dynamic path into corresponding track indexes. The specific implementation of the track path decomposition, the judgment and search of the path optimal index, the track index conversion, and the like is as described above, and details are not repeated.
Taking the application of the present invention in a 4kb/s distributed Pulse Code-excited Linear Prediction (DP-CELP) speech coding algorithm as an example, a Linear Pulse Code Modulation (PCM) signal sampled at 8kHz is input, an analysis frame length is 20 ms, an algebraic codebook is divided into 4 tracks, and the number of bits occupied by each track is 3, 4, that is, the track lengths are 8, 16, respectively. The algebraic codebook full search algorithm adopts a multiple loop nesting method, each subframe needs 65536 loops, and the computational complexity of the algebraic codebook search part is 85.2MOPS (Million Operations Per Second ). The invention avoids multiple nested loops of full search, and only 512 loops are needed in each subframe when the algebraic codebook is searched. Because the interior of the loop body is completely the same as that of the full search method, the computation complexity of the algebraic digital codebook searching part is only 0.67MOPS which is about 1/128 of that of the full search method after the method is used.
The following average distortion measure D was used to examine the impact of the invention on coding quality:
wherein, M is the number of the sub-frames contained in the test statement. e.g. of the type i Is given byAnd (3) discharging:
in the formula, the superscript (i) represents the input parameter of the ith subframe; n is the subframe length; x is the number of 0 (n) subtracting the residual signal of the weighted synthesis filter zero input response for perceptually weighted speech; x is the number of u′ (n)、t j′ (n) zero state responses of the best vector output by the adaptive codebook and the best random codevector output by the algebraic codebook through a perceptual weighted synthesis filter, respectively;an adaptive codebook gain quantized for the ith subframe;and the quantized fixed codebook gain of the ith subframe.
The average distortion statistics after testing a large number of statements are shown in the table below. As can be seen from the table, the average distortion of the invention is improved by 0.4% only compared with the full search, and can be ignored. From this point of view, the present invention does not cause any degradation in the quality of speech coding.
Algorithm | Algebraic codebook full search | The invention |
Mean distortion D | 115.6 | 116.1 |
In order to further test the coding performance of the invention, the full search algorithm and the invention are objectively evaluated for voice Quality, and Evaluation software is P.862 of ITU-T to objectively evaluate the Perceptual Evaluation of Speech Quality (PESQ). Chinese speech consists of 16 sentences, 8 of which are from male speech and 8 of which are from female speech. The test results are shown in the table below. As can be seen from the table, the Mean Opinion Score (MOS) of the present invention is only 0.005 lower than the general search algorithm, which is not subjectively perceptible, consistent with the experimental results presented in the table above.
Female voice | MOS divide | Men's voice | MOS divides | |||
Full search algorithm | The invention | Full search algorithm | The invention | |||
F01 | 3.121 | 3.056 | M01 | 2.958 | 2.927 | |
F02 | 3.166 | 3.220 | M02 | 2.903 | 3.022 | |
F03 | 2.951 | 2.905 | M03 | 3.222 | 3.284 | |
F04 | 3.353 | 3.356 | M04 | 2.948 | 3.048 | |
F05 | 3.265 | 3.202 | M05 | 3.389 | 3.306 | |
F06 | 3.055 | 3.056 | M06 | 3.290 | 3.287 | |
F07 | 3.057 | 2.839 | M07 | 3.141 | 3.298 | |
F08 | 2.866 | 2.874 | M08 | 3.227 | 3.146 | |
Average female voice | 3.10425 | 3.0635 | Average male voice | 3.13475 | 3.16475 | |
Total mean of | Full search algorithm | The invention | ||||
3.1195 | 3.114125 |
It can be seen from the above detection results that the present invention hardly affects the quality of the encoded speech, and the complexity is only 1/128 of the full search algorithm.
The above description is intended to be illustrative of the preferred embodiment of the present invention and should not be taken as limiting the invention, but rather, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.
Claims (18)
1. A method for searching an algebraic codebook, wherein each algebraic codebook comprises a plurality of subcodebooks, each subcodebook comprises a plurality of tracks, and each track stores locations where one or more non-zero pulses may occur, the method comprising:
A. decomposing a plurality of tracks contained in each sub-codebook into a plurality of paths, wherein the plurality of paths form a path group to be searched, and each path contains a plurality of possible positions where non-zero pulses can appear;
B. determining the optimal index of each path in the path group to be searched according to the track position of the nonzero pulse in the path;
C. and selecting a fixed path and at least one dynamic path from the path group to be searched, performing combined cyclic search on all the fixed paths and the dynamic paths to obtain an optimal algebraic subcodebook index corresponding to the dynamic path, and converting the searched optimal algebraic subcodebook index of the fixed path and the searched optimal algebraic subcodebook index corresponding to the dynamic path into a corresponding track index.
2. The algebraic codebook searching method of claim 1, wherein the most optimal index of the path is a path index satisfying the following condition:
wherein, I x For the optimal index of path x, p (j) is the track position corresponding to index j, and r '(n) is the residual target vector x' 0 Convolution with perceptually weighted synthesis filter impulse response h (n), s x Is the non-zero pulse amplitude for path x, either +1 or-1.
3. The algebraic codebook searching method of claim 1, wherein the optimal algebraic subcodebook index for the dynamic path is an algebraic codebook index i that maximizes the following equation e in the cyclic search of all fixed and dynamic path combinations m :
Wherein r' (i), i =0,1, \ 8230;, N c -1 is a residual target vector x' 0 Convolution with perceptually weighted synthesis filter impulse response h (N) _ (i, j), i, j =0,1, \8230;, N c -1 is a covariance matrix of h (N), N c Is the number of non-zero pulses in the random excitation vector, m i For positions in the dynamic path where non-zero pulses may occur, s i Is the pulse amplitude.
4. A search method of an algebraic codebook according to claim 1, 2 or 3, wherein in decomposing the tracks, one track is decomposed into one or two paths according to the track length.
5. An algebraic codebook searching method as defined in claim 4, wherein when a track is decomposed into two paths, the even-term position in the track is one path and the odd-term position is one path.
6. The algebraic codebook searching method of claim 5, wherein when converting the searched optimal index of the fixed path and the optimal algebraic subcodebook index corresponding to the dynamic path into corresponding track indexes, when a track contains only 1 path, the track index is the track index;
when a track has 2 paths, the path index is converted into a track index by the following equation:
wherein, c i Is an index of track I, I i The a-path is the even term of the track and the b-path is the odd term of the track, which is the index of path i.
7. An apparatus for searching an algebraic codebook, wherein each algebraic codebook comprises a plurality of subcodebooks, each subcodebook comprises a plurality of tracks, and each track stores locations where one or more non-zero pulses may occur, the apparatus comprising:
the track path decomposition module is used for decomposing a plurality of tracks contained in each sub-codebook into a plurality of paths, the paths form a path group to be searched, and each path contains a plurality of possible positions of non-zero pulses;
the optimal index searching module is used for determining the optimal index of each path in the path group to be searched according to the track position of the nonzero pulse in the path; and
and the path cyclic search module is used for selecting a fixed path and at least one dynamic path from the path group to be searched, performing combined cyclic search on all the fixed paths and the dynamic paths to obtain the optimal algebraic subcodebook index corresponding to the dynamic path, and converting the searched optimal index of the fixed path and the searched optimal algebraic subcodebook index corresponding to the dynamic path into corresponding track indexes.
8. The algebraic codebook searching device of claim 7, wherein the most optimal index of the path is a path index satisfying the following condition:
wherein, I x For the optimal index of path x, p (j) is the track position corresponding to index j, and r '(n) is residual target vector x' 0 Convolution with perceptually weighted synthesis filter impulse response h (n), s x Is the non-zero pulse amplitude for path x, either +1 or-1.
9. Of an algebraic codebook according to claim 7The searching apparatus is characterized in that the optimal algebraic subcodebook index corresponding to the dynamic path is an algebraic codebook index i which maximizes the following expression e in a cyclic search process of all combinations of fixed paths and dynamic paths m :
Wherein r' (i), i =0,1, \8230, N c -1 is a residual target vector x' 0 Convolution with perceptually weighted synthesis filter impulse response h (N) _ (i, j), i, j =0,1, \8230;, N c -1 is a covariance matrix of h (N), N c Is the number of non-zero pulses in the random excitation vector, m i For positions in the dynamic path where non-zero pulses may occur, s i Is the pulse amplitude.
10. The apparatus for searching an algebraic codebook according to claim 7, 8 or 9, wherein in decomposing the tracks, one track is decomposed into one or two paths according to the track length.
11. The apparatus of algebraic codebook searching device of claim 10, wherein when a track is decomposed into two paths, the even term position in the track is a path and the odd term position is a path.
12. The algebraic codebook searching device of claim 11, wherein when converting the searched optimal algebraic subcodebook indices for the fixed path and the dynamic path into corresponding track indices, the track index is a track index when one track contains only 1 path;
when a track has 2 paths, the path index is converted into a track index by the following equation:
wherein, c i Being an index of track I, I i The a-path is the even term of the track and the b-path is the odd term of the track, which is the index of path i.
13. A speech encoder comprising:
codebook searching means for searching a pulse position in a codebook at which a weighted error between a residual target vector and a synthesized speech signal is minimized;
a first adder for adding pulses corresponding to the positions of the nonzero pulses searched by the codebook searching device and outputting a random excitation vector;
a perceptual weighting synthesis filter for performing perceptual weighting synthesis processing on the random excitation vector and outputting a perceptually weighted synthesized speech signal; and
a second adder for subtracting the residual target vector from the synthesized speech signal and outputting a weighted error signal;
each algebraic codebook comprises a plurality of subcodebooks, each subcodebook comprises a plurality of tracks, and each track stores the position where one or more nonzero pulses possibly appear;
characterized in that, the codebook searching device comprises:
the track path decomposition module is used for decomposing a plurality of tracks contained in each sub-codebook into a plurality of paths, the plurality of paths form a path group to be searched, and each path contains a plurality of possible positions where nonzero pulses can appear;
the optimal index searching module is used for determining the optimal index of each path in the path group to be searched according to the track position of the nonzero pulse in the path; and
and the path cyclic search module is used for selecting a fixed path and at least one dynamic path from the path group to be searched, performing combined cyclic search on all the fixed paths and the dynamic paths to obtain the optimal algebraic subcodebook index corresponding to the dynamic path, and converting the searched optimal index of the fixed path and the optimal subcodebook index corresponding to the dynamic path into corresponding track indexes.
14. The speech coder of claim 13, wherein the optimal index for the path is a path index that satisfies the following condition:
wherein, I x For the optimal index of path x, p (j) is the track position corresponding to index j, and r '(n) is residual target vector x' 0 Convolution with perceptually weighted synthesis filter impulse response h (n), s x Is the non-zero pulse amplitude for path x, either +1 or-1.
15. The speech coder of claim 13, wherein the optimal algebraic subcodebook index for the dynamic path is an algebraic codebook index i that maximizes equation e as follows during the round-robin search over all fixed and dynamic path combinations m :
Wherein r' (i), i =0,1, \ 8230;, N c -1 is a residual target vector x' 0 Convolution with perceptually weighted synthesis filter impulse response h (N) _ (i, j), i, j =0,1, \ 8230N c -1 is a covariance matrix of h (N), N c Is the number of non-zero pulses in the random excitation vector, m i For positions in the dynamic path where non-zero pulses may occur, s i Is the pulse amplitude.
16. The speech coder of claim 13, 14 or 15, wherein in decomposing the tracks, a track is decomposed into one or two paths according to the track length.
17. The speech coder of claim 16, wherein when a track is decomposed into two paths, the even term position in the track is a path and the odd term position in the track is a path.
18. The speech coder of claim 17, wherein when the searched optimal index of the fixed path and the optimal algebraic subcodebook index corresponding to the dynamic path are converted into corresponding track indexes, when one track has only 1 path, the path index is a track index;
when a track has 2 paths, the path index is converted into a track index by the following equation:
wherein, c i Is an index of track I, I i The a-path is the even term of the track and the b-path is the odd term of the track, which is the index of path i.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNA2006100620071A CN101118748A (en) | 2006-08-04 | 2006-08-04 | Search method, device and speech coder for algebraic codebook |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNA2006100620071A CN101118748A (en) | 2006-08-04 | 2006-08-04 | Search method, device and speech coder for algebraic codebook |
Publications (1)
Publication Number | Publication Date |
---|---|
CN101118748A true CN101118748A (en) | 2008-02-06 |
Family
ID=39054826
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNA2006100620071A Pending CN101118748A (en) | 2006-08-04 | 2006-08-04 | Search method, device and speech coder for algebraic codebook |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101118748A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009006819A1 (en) * | 2007-07-11 | 2009-01-15 | Huawei Technologies Co., Ltd. | Fixed codebook search method, searcher and computer readable medium |
CN103456309A (en) * | 2012-05-31 | 2013-12-18 | 展讯通信(上海)有限公司 | Voice coder and algebraic code list searching method and device thereof |
CN110932739A (en) * | 2019-12-20 | 2020-03-27 | 成都大学 | System and method for reducing error interference of communication and radar excitation signals |
CN116052700A (en) * | 2022-07-29 | 2023-05-02 | 荣耀终端有限公司 | Voice coding and decoding method, and related device and system |
-
2006
- 2006-08-04 CN CNA2006100620071A patent/CN101118748A/en active Pending
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009006819A1 (en) * | 2007-07-11 | 2009-01-15 | Huawei Technologies Co., Ltd. | Fixed codebook search method, searcher and computer readable medium |
US8515743B2 (en) | 2007-07-11 | 2013-08-20 | Huawei Technologies Co., Ltd | Method and apparatus for searching fixed codebook |
CN103456309A (en) * | 2012-05-31 | 2013-12-18 | 展讯通信(上海)有限公司 | Voice coder and algebraic code list searching method and device thereof |
CN103456309B (en) * | 2012-05-31 | 2016-04-20 | 展讯通信(上海)有限公司 | Speech coder and algebraically code table searching method thereof and device |
CN110932739A (en) * | 2019-12-20 | 2020-03-27 | 成都大学 | System and method for reducing error interference of communication and radar excitation signals |
CN110932739B (en) * | 2019-12-20 | 2021-05-18 | 成都大学 | System and method for reducing error interference of communication and radar excitation signals |
CN116052700A (en) * | 2022-07-29 | 2023-05-02 | 荣耀终端有限公司 | Voice coding and decoding method, and related device and system |
CN116052700B (en) * | 2022-07-29 | 2023-09-29 | 荣耀终端有限公司 | Voice coding and decoding method, and related device and system |
WO2024021747A1 (en) * | 2022-07-29 | 2024-02-01 | 荣耀终端有限公司 | Sound coding method, sound decoding method, and related apparatuses and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA2684379C (en) | A speech coder using an orthogonal search and an orthogonal search method | |
McCree et al. | A 2.4 kbit/s MELP coder candidate for the new US Federal Standard | |
CA2202825C (en) | Speech coder | |
US6122608A (en) | Method for switched-predictive quantization | |
KR100304682B1 (en) | Fast Excitation Coding for Speech Coders | |
CN1121683C (en) | Speech coding | |
JP3196595B2 (en) | Audio coding device | |
EP0718822A2 (en) | A low rate multi-mode CELP CODEC that uses backward prediction | |
US20050114123A1 (en) | Speech processing system and method | |
CN1192357C (en) | Adaptive criterion for speech coding | |
CN101118748A (en) | Search method, device and speech coder for algebraic codebook | |
Dusan et al. | Speech compression by polynomial approximation | |
KR101369064B1 (en) | Audio encoding device and audio encoding method | |
Yeh et al. | An efficient complexity reduction algorithm for G. 729 speech codec | |
CN100367347C (en) | Sound encoder and sound decoder | |
Tanaka et al. | Low-bit-rate speech coding using a two-dimensional transform of residual signals and waveform interpolation | |
JP3185748B2 (en) | Signal encoding device | |
JP3579276B2 (en) | Audio encoding / decoding method | |
JP3153075B2 (en) | Audio coding device | |
Xydeas et al. | Theory and Real Time Implementation of a CELP Coder at 4.8 and 6.0 kbits/second Using Ternary Code Excitation | |
Kemp et al. | LPC parameter quantization at 600, 800 and 1200 bits per second | |
JPH07168596A (en) | Voice recognizing device | |
JP3984048B2 (en) | Speech / acoustic signal encoding method and electronic apparatus | |
CN1159044A (en) | Voice coder | |
Kim et al. | On a Reduction of Pitch Searching Time by Preprocessing in the CELP Vocoder |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C12 | Rejection of a patent application after its publication | ||
RJ01 | Rejection of invention patent application after publication |
Open date: 20080206 |