CN101118748A - Search method, device and speech coder for algebraic codebook - Google Patents

Search method, device and speech coder for algebraic codebook Download PDF

Info

Publication number
CN101118748A
CN101118748A CNA2006100620071A CN200610062007A CN101118748A CN 101118748 A CN101118748 A CN 101118748A CN A2006100620071 A CNA2006100620071 A CN A2006100620071A CN 200610062007 A CN200610062007 A CN 200610062007A CN 101118748 A CN101118748 A CN 101118748A
Authority
CN
China
Prior art keywords
path
track
index
optimal
algebraic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2006100620071A
Other languages
Chinese (zh)
Inventor
鲍长春
窦庚欣
范睿
刘泽新
李立雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Beijing University of Technology
Original Assignee
Huawei Technologies Co Ltd
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd, Beijing University of Technology filed Critical Huawei Technologies Co Ltd
Priority to CNA2006100620071A priority Critical patent/CN101118748A/en
Publication of CN101118748A publication Critical patent/CN101118748A/en
Pending legal-status Critical Current

Links

Images

Abstract

The present invention is suitably applied to the field of speech encoding; and provides a search method, a device and an encoder for an algebraic codebook. Each algebraic codebook comprises a plurality of subsidiary codebooks, and each subsidiary codebook comprises a plurality of tracks, each track is provided with one or a plurality of places for the possible appearance of non-zero pulse. The method decomposes a plurality of tracks contained in each codebook into a plurality of channels, and the optimal reference of every channel is determined according to the position of non-zero pulse in the channel. A fixed channel and a dynamic channel are selected and combined to cyclically search for corresponding track preference. The present invention decomposes the tracks of subsidiary codebooks into a plurality of channels, and searches for the ultimate codebook preference through the combination of the fixed channel and the dynamic channel; and effectively reduces the sophistication of realizing codebook search and improving the application value of coding algorithm.

Description

Algebraic codebook searching method and device and voice coder
Technical Field
The present invention belongs to the field of speech coding, and in particular, to a method and apparatus for searching an algebraic codebook in speech coding, and a speech coder.
Background
Speech coding is an algorithm that compresses the digital representation of speech signals to minimize the bit requirements needed to represent these signals, and is currently largely classified into three categories, waveform coding, parametric coding, and hybrid coding. Waveform coding has the advantages of strong adaptability, high voice quality and the like, but the required bit rate is higher. The bit rate of parametric coding is generally low, but the coding quality is poor, and the naturalness of the synthesized speech is low. The mixed coding overcomes the defects of wave form coding and parameter coding, combines the advantages of the wave form coding and the parameter coding, and can obtain high-quality synthesized voice at the rate of 4-16 kb/s. Algebraic Code-excited Linear Prediction (ACELP) speech coding, as a hybrid speech coding method, has found application in many speech coding standards of the International Telecommunications Union (ITU), such as g.722.2, g.723.1, g.729, etc.
The random excitation signal is a main component of a Code-excited Linear Prediction (CELP) speech coding model. The ACELP speech coding algorithm uses a random codevector containing a number of non-zero pulses to describe the random excitation signal. The amplitude of the non-zero pulses can only be +1 or-1, the position of which is described by an algebraic codebook. The algebraic codebook stores the positions where non-zero pulses of the random excitation vector may occur. When the coding rate is less than 8kb/s, ACELP speech coding algorithms typically employ an algebraic multiple subcodebook structure, each subcodebook differing in the position in the codebook where non-zero pulses may occur. The combination of the trade-off between the number of bits and the coding quality, each algebraic subcodebook usually comprises 4 tracks, each track storing the position where one or several non-zero pulses may occur. The specific form of a sub-codebook is shown in the following table, where each sub-codebook has 2 modes, and occupies 1 bit, and each sub-codebook contains 4 tracks, i.e. the positions where m0, m1, m2, and m3 may appear in the table, and under low-rate conditions, each track generally occupies 3-4 bits, i.e. 8 or 16 candidate positions.
Subcodebook Pulse sign and position
m0 m1 m2 m3
Subcodebook
1 Mode 0 (+)0,6,16,26, 36,46,56,66 (-)1,8,18,28, 38,48,58,69 (+)2,10,20,30, 40,50,60,72 (-)3,12,22,32,42,52,62,75 4,14,24,34,44,54,64,78
Mode 1 (-)0,7,17,27, 37,47,57,67 (+)1,9,19,29, 39,49,59,70 (-)2,11,21,31, 41,51,61,73 (+)3,13,23,33,43,53,63,76 5,15,25,35,45,55,65,79
Subcodebook 2 Mode 0 (+)0,5,10,15, 20,26,36,46 (-)1,6,11,16, 21,28,38,48 (+)2,7,12,17, 22,30,40,50 (-)3,8,13,18,23,32,42,52 4,9,14,19,24,34,44,54
Mode 1 (-)0,5,10,15, 20,25,35,45 (+)1,6,11,16, 21,27,37,47 (-)2,7,12,17, 22,29,39,49 (+)3,8,13,18,23,31,41,51 4,9,14,19,24,33,43,53
Subcodebook 3 Mode 0 (+)0,5,10,15, 20,25,30,35 (-)1,6,11,16, 21,26,31,36 (+)2,7,12,17, 22,27,32,37 (-)3,8,13,18,23,28,33,38 4,9,14,19,24,29,34,39
Mode 1 (-)0,5,10,15, 20,25,30,35 (+)1,6,11,16, 21,26,31,36 (-)2,7,12,17, 22,27,32,37 (+)3,8,13,18,23,28,33,38 4,9,14,19,24,29,34,39
In most ACELP speech coding algorithms, the algebraic codebook mostly uses a full search algorithm, i.e. traverses all position combinations in 4 tracks, so that the synthesized ones are synthesized by random codevectorsThe speech quality is highest. As shown in FIG. 1, in coding, a certain sub-codebook is selected by a pitch period, each codebook comprises 4 tracks, the positions of four pulses in the 4 tracks are circularly controlled, a random excitation vector is obtained by adding the four pulses, the random excitation vector is input to a perceptual weighting filter to obtain a perceptually weighted synthesized speech signal, the four pulse positions are selected according to the principle of minimizing the weighted error between a residual target vector and the synthesized speech signal, namely, the four pulse positions can be obtained by the following formula e at the maximum, and a certain algebraic sub-codebook index i selected by the pitch period is obtained m
Figure A20061006200700081
Wherein r (i), i =0,1, \8230, N c -1 is a residual target vector x' 0 Convolution with perceptually weighted synthesis filter impulse response h (n), residual target vector x' 0 Subtracting the zero-input response and adaptive codebook contribution of the perceptually weighted synthesis filter from the weighted input speech, _ (i, j), i, j =0,1, \ 8230;, N c -1 is a covariance matrix of h (n); n is a radical of c The number of non-zero pulses in the random excitation vector is set; m is a unit of i The positions where non-zero pulses may occur; s i The pulse amplitude, here, can only be +1 or-1.
The expression for the perceptual weighted synthesis filter is as follows:
Figure A20061006200700091
wherein alpha is i Is a linear prediction coefficient, gamma is a stretching factor, typically between 0.8 and 0.9, and p is a filter order.
Because the time complexity of the algebraic codebook full search and the bit number occupied by the algebraic codebook form an exponential relation, when the bit number occupied by the algebraic codebook increases, the complexity of the algebraic codebook full search algorithm can rapidly increase and the speech coding algorithm is difficult to realize in real time, thereby influencing the application value of the algorithm.
Disclosure of Invention
The invention aims to provide a searching method of an algebraic codebook, and aims to solve the problems that the time complexity of algebraic codebook full search in the prior art is high, and the number of bits occupied by the algebraic codebook is increased, so that a voice coding algorithm is difficult to realize in real time, and the application value of the algorithm is influenced.
Another object of the present invention is to provide an algebraic codebook searching device.
It is another object of the present invention to provide a speech coder.
The invention is realized by a searching method of an algebraic codebook, wherein each algebraic codebook comprises a plurality of subcodebooks, each subcodebook comprises a plurality of tracks, and each track stores positions where one or more nonzero pulses can appear, the method comprises the following steps:
A. decomposing a plurality of tracks contained in each sub-codebook into a plurality of paths, wherein the plurality of paths form a path group to be searched, and each path contains a plurality of possible positions where non-zero pulses can appear;
B. determining the optimal index of each path in the path group to be searched according to the track position of the nonzero pulse in the path;
C. and selecting a fixed path and at least one dynamic path from the path group to be searched, performing combined cyclic search on all the fixed paths and the dynamic paths to obtain an optimal algebraic subcodebook index corresponding to the dynamic path, and converting the searched optimal algebraic subcodebook index of the fixed path and the searched optimal algebraic subcodebook index corresponding to the dynamic path into a corresponding track index.
The optimal index of the path is a path index which meets the following conditions:
Figure A20061006200700101
wherein, I x For the optimal index of path x, p (j) is the track position corresponding to index j, and r '(n) is residual target vector x' 0 Convolution with perceptually weighted synthesis filter impulse response h (n), s x Is the non-zero pulse amplitude for path x, either +1 or-1.
The optimal algebraic subcodebook index corresponding to the dynamic path is an algebraic codebook index i which maximizes the following expression e in the cyclic search process of all the fixed paths and dynamic path combinations m
Figure A20061006200700102
Wherein r' (i), i =0,1, \8230, N c -1 is a residual target vector x' 0 Convolution with perceptually weighted synthesis filter impulse response h (N) _ (i, j), i, j =0,1, \ 8230N c -1 is a covariance matrix of h (N), N c Is the number of non-zero pulses in the random excitation vector, m i For positions in the dynamic path where non-zero pulses may occur, s i Is the pulse amplitude.
When the tracks are decomposed, one track is decomposed into one or two paths according to the length of the track.
When a track is divided into two paths, the even term position in the track is a path, and the odd term position in the track is a path.
When the searched optimal index of the fixed path and the optimal algebraic subcodebook index corresponding to the dynamic path are converted into corresponding track indexes, when one track only contains 1 path, the path index is the track index;
when a track has 2 paths, the path index is converted to the track index by:
wherein, c i Is an index of track I, I i For the index of path i, the a-path is the even entry of the track and the b-path is the odd entry of the track.
An apparatus for searching an algebraic codebook, wherein each algebraic codebook comprises a plurality of subcodebooks, each subcodebook comprises a plurality of tracks, and each track stores locations where one or more non-zero pulses may occur, the apparatus comprising:
the track path decomposition module is used for decomposing a plurality of tracks contained in each sub-codebook into a plurality of paths, the plurality of paths form a path group to be searched, and each path contains a plurality of possible positions where nonzero pulses can appear;
the optimal index searching module is used for determining the optimal index of each path in the path group to be searched according to the track position of the nonzero pulse in the path; and
and the path cyclic search module is used for selecting a fixed path and at least one dynamic path from the path group to be searched, performing combined cyclic search on all the fixed paths and the dynamic paths to obtain the optimal algebraic subcodebook index corresponding to the dynamic path, and converting the searched optimal index of the fixed path and the optimal subcodebook index corresponding to the dynamic path into corresponding track indexes.
The optimal index of the path is a path index satisfying the following conditions:
Figure A20061006200700111
wherein, I x For the optimal index of path x, p (j) is the track position corresponding to index j, and r '(n) is the residual target vector x' 0 Convolution with perceptually weighted synthesis filter impulse response h (n), s x Is the non-zero pulse amplitude for path x, either +1 or-1.
The dynamic path pairThe optimal algebraic subcodebook index is the algebraic codebook index i that maximizes the following e in the cyclic search of all fixed and dynamic path combinations m :
Figure A20061006200700112
Wherein r' (i), i =0,1, \8230, N c -1 is a residual target vector x' 0 Convolution with perceptually weighted synthesis filter impulse response h (N) _ (i, j) i, j =0,1, \8230;, N c -1 is a covariance matrix of h (N), N c Is the number of non-zero pulses in the random excitation vector, m i For positions in the dynamic path where non-zero pulses may occur, s i Is the pulse amplitude.
When the track is decomposed, one track is decomposed into one or two paths according to the length of the track.
When a track is divided into two paths, the even term position in the track is a path, and the odd term position in the track is a path.
When the searched optimal index of the fixed path and the optimal algebraic subcodebook index corresponding to the dynamic path are converted into corresponding track indexes, when one track only contains 1 path, the path index is the track index;
when a track has 2 paths, the path index is converted into a track index by the following equation:
Figure A20061006200700121
wherein, c i Index of track I, I i For the index of path i, the a-path is the even entry of the track and the b-path is the odd entry of the track.
A speech encoder comprising:
a codebook searching means for searching a pulse position in a codebook at which a weighted error between a residual target vector and a synthesized speech signal is minimized;
a first adder for adding pulses corresponding to the positions of the nonzero pulses searched by the codebook searching device and outputting a random excitation vector;
a perceptual weighting synthesis filter for performing perceptual weighting synthesis processing on the random excitation vector and outputting a perceptually weighted synthesized speech signal; and
a second adder for subtracting the residual target vector from the synthesized speech signal and outputting a weighted error signal;
each algebraic codebook comprises a plurality of subcodebooks, each subcodebook comprises a plurality of tracks, and each track stores the position where one or more nonzero pulses possibly appear;
characterized in that, the codebook searching device comprises:
the track path decomposition module is used for decomposing a plurality of tracks contained in each sub-codebook into a plurality of paths, the plurality of paths form a path group to be searched, and each path contains a plurality of possible positions where nonzero pulses can appear;
the optimal index searching module is used for determining the optimal index of each path in the path group to be searched according to the track position of the non-zero pulse in the path; and
and the path cyclic search module is used for selecting a fixed path and at least one dynamic path from the path group to be searched, performing combined cyclic search on all the fixed paths and the dynamic paths to obtain the optimal algebraic subcodebook index corresponding to the dynamic path, and converting the searched optimal index of the fixed path and the optimal subcodebook index corresponding to the dynamic path into corresponding track indexes.
The optimal index of the path is a path index satisfying the following conditions:
Figure A20061006200700131
wherein, I x For the optimal index of path x, p (j) is the track position corresponding to index j, and r '(n) is residual target vector x' 0 Convolution with perceptually weighted synthesis filter impulse response h (n), s x Is the non-zero pulse amplitude for path x.
The optimal algebraic subcodebook index corresponding to the dynamic path is an algebraic codebook index i which maximizes the following expression e in the cyclic search process of all the fixed paths and dynamic path combinations m
Wherein r' (i), i =0,1, \ 8230;, N c -1 is a residual target vector x' 0 Convolution with perceptually weighted synthesis filter impulse response h (N) _ (i, j), i, j =0,1, \8230;, N c -1 is a covariance matrix of h (N), N c Is the number of non-zero pulses in the random excitation vector, m i For positions in the dynamic path where non-zero pulses may occur, s i Is the pulse amplitude.
When the track is decomposed, one track is decomposed into one or two paths according to the length of the track.
When a track is divided into two paths, the even term position in the track is a path, and the odd term position in the track is a path.
When the searched optimal index of the fixed path and the optimal algebraic subcodebook index corresponding to the dynamic path are converted into corresponding track indexes, when one track only contains 1 path, the path index is the track index;
when a track has 2 paths, the path index is converted to the track index by:
Figure A20061006200700133
wherein, c i Is an index of track I, I i The a-path is the even term of the track and the b-path is the odd term of the track, which is the index of path i.
The invention decomposes the track of the algebraic subcodebook into a plurality of paths, and searches the final codebook index by combining the fixed path and the dynamic path, thereby effectively reducing the complexity of realizing codebook search and improving the application value of the coding algorithm while ensuring the coding quality.
Drawings
FIG. 1 is a schematic diagram of codebook search in speech coding provided in the prior art;
FIG. 2 is a schematic diagram of the implementation of codebook search in speech coding provided by the present invention;
fig. 3 is a block diagram of a codebook searching apparatus in a speech coder according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
The invention decomposes the algebraic subcodebook track into a plurality of paths, and searches the final codebook index through the combination of the fixed path and the dynamic path, thereby effectively reducing the complexity of realizing codebook search while ensuring the coding quality.
In speech coding, an input speech signal is subjected to preprocessing and linear predictive analysis to obtain a perceptually weighted synthesis filter and a target vector x 0 To x 0 Obtaining a residual target vector X 'after adaptive codebook search' 0
x′ 0 (n)=X 0 (n)-g a x u (n)
In the formula, g a For adaptive codebook gain, x 0 (n) is the signal of the weighted input speech minus the zero input response of the perceptually weighted synthesis filter, x u (n) is the zero state response of the adaptive codebook to the perceptually weighted synthesis filter. The algebraic codebook is divided into a plurality of subcodebooks, the selection of which is determined by the pitch period of each subframe, without the need to transmit additional information. According to the experimental results, the number of subcodebooks in the 4kb/sACELP speech coding algorithm is usually 3.
As shown in fig. 2, in the present invention, when encoding speech, one of the 3 sub-codebooks is first selected as a codebook to be searched through a pitch period. In order to improve the search accuracy, the invention divides 4 tracks in the codebook into several paths, and each path comprises 8 possible positions of non-zero pulse. The more the number of paths of the track decomposition, the higher the algorithm complexity, and as an embodiment of the present invention, when the non-zero pulse position in the track is represented by 3 bits (the track length is 8), one track is a path. When the non-zero pulse position in a track is represented by 4 bits (track length 16), one track is decomposed into 2 paths. As a preferred embodiment of the present invention, the specific decomposition method of the path is: the a-path of track i is the even term of track i and the b-path of track i is the odd term of track i. For example, when the number of bits allocated to each track is 3, 4, and 4 (i.e., the track lengths are 8, 16, and 16, respectively), the 4 tracks are decomposed into 7 paths, i.e., path 0, path 1a, path 1b, path 2a, path 2b, path 3a, and path 3b, and the set of all these paths constitutes a path group to be searched. The path decomposition method can make each path have good track covering effect, and of course, there may be other decomposition methods, for example, the first 8 positions and the last 8 positions in the track are divided into one path respectively, but each path cannot cover the whole track completely, which affects the encoding quality.
After the path group to be searched is determined, respectively pre-searching each path to find the optimal index of each path, wherein the optimal index of each path is I meeting the following formula x
In the formula I x An optimal index for path x; p (j) is the track position corresponding to index j; r '(n) is a residual target vector x' 0 Convolution with the perceptually weighted synthesis filter impulse response h (n); s x Is the non-zero pulse amplitude for path x, either +1 or-1.
The final codebook search is performed after the optimal index of each path is determined. Codebook search is done by round robin control. As an embodiment of the present invention, in each loop, 3 fixed paths and 1 dynamic path are selected from the group of paths to be searched, and if 2 paths belong to the same track, only 1 path can be selected.
In implementation, the complexity of the algorithm will increase rapidly with the increase of dynamic paths, but the coding quality will not be improved significantly, so the number of dynamic paths selected from the path group to be searched cannot be too large.
During searching, the path index of the fixed path uses the optimal index of each path obtained by pre-searching, and the index of the fixed path is fixed in the searching process. For 3 fixed paths (with the optimal index being I respectively) a 、I b 、I c ) Computing residual target vector x 'using its best index' 0 Covariance matrix of convolution r '(i) with perceptually weighted synthesis filter impulse response h (n) and h (n) _' (ij):
i=0,1,…,N-1;j=i,…,N-1
wherein N is the subframe length, and h (N) is the sense additionThe impulse response of the weight synthesis filter, where I, j ∈ { I ∈ [ ] a ,I b ,I c }。
For a dynamic path, r '(i) and _' (i, j) are computed over each of its selectable positions, and the algebraic codebook index i that maximizes the following e is found m :
Figure A20061006200700162
Wherein r (i), i =0,1, \ 8230;, N c -1 is a residual target vector x' 0 Convolution with a perceptually weighted synthesis filter impulse response h (n); (i, j), i, j =0,1, \8230;, N c -1 is a covariance matrix of h (n); n is a radical of hydrogen c The number of non-zero pulses in the random excitation vector; m is a unit of i Positions where non-zero pulses may occur; s i For pulse amplitudes, this can only be +1 or-1. If the maximum value e max Greater than the maximum value E occurring in the previous path loop search max Then use e max In place of E max And record { I } a ,I b ,I c ,i m }. After searching all possible fixed and dynamic path combinations, the last recorded path index is indexed I a ,I b ,I c ,i m Convert to track index c 0 ,c 1 ,c 2 ,c 3 And completing the fixed codebook search. When a track contains only 1 path at the time of switching, the path index is the track index. When a track contains only 2 paths, the path index is converted into a track index by the following equation:
Figure A20061006200700163
in the formula, c i An index for track i; i is i Is an index to path i.
After fixed codebook search is completed, adding the four searched pulses to obtain a random excitation vector, inputting the random excitation vector into a perceptual weighting filter to obtain a perceptually weighted synthesized speech signal, selecting four pulse positions according to the principle of minimizing the weighted error of a residual target vector and the synthesized speech signal, namely obtaining the four pulse positions by the following formula e at the maximum, and obtaining a certain algebraic subcodebook index i selected by a pitch period:
wherein r (i), i =0,1, \8230, N c -1 is a residual target vector x' 0 Convolution with perceptually weighted synthesis filter impulse response h (n), residual target vector x' 0 Subtracting the zero-input response and adaptive codebook contribution of the perceptually weighted synthesis filter from the weighted input speech, _ (i, j), i, j =0,1, \ 8230;, N c -1 is a covariance matrix of h (n); n is a radical of hydrogen c The number of non-zero pulses in the random excitation vector; m is i The positions where non-zero pulses may occur; s is i Pulse amplitude is +1 or-1.
Fig. 3 shows a structure of a codebook searching apparatus in a speech coder according to the present invention, which searches for a pulse position in a codebook that minimizes a weighted error between a residual target vector and a synthesized speech signal by a codebook searching apparatus 12, wherein:
the track path decomposition module 122 decomposes the plurality of tracks included in each sub-codebook into a plurality of paths, the plurality of paths form a path group to be searched, each path includes a position where a plurality of non-zero pulses may appear, the optimal index search module 124 determines an optimal index of each path in the path group to be searched according to the track position of the non-zero pulses in the path, the path loop search module 126 selects a fixed path and at least one dynamic path from the path group to be searched, performs combined loop search on all the fixed paths and the dynamic paths, obtains an optimal algebraic sub-codebook index corresponding to the dynamic path, and converts the searched optimal index of the fixed path and the searched optimal algebraic sub-codebook index corresponding to the dynamic path into corresponding track indexes. The specific implementation of the track path decomposition, the judgment and search of the path optimal index, the track index conversion, and the like is as described above, and details are not repeated.
Adder 14 adds pulses corresponding to the positions of the nonzero pulses searched by codebook searching section 12, and outputs a random excitation vector. The perceptual weighting synthesis filter 16 performs perceptual weighting synthesis processing on the random excitation vector, and outputs a perceptually weighted synthesized speech signal. The adder 18 subtracts the synthesized speech signal from the residual target vector, and outputs a weighted error signal.
Taking the application of the present invention in a 4kb/s distributed Pulse Code-excited Linear Prediction (DP-CELP) speech coding algorithm as an example, a Linear Pulse Code Modulation (PCM) signal sampled at 8kHz is input, an analysis frame length is 20 ms, an algebraic codebook is divided into 4 tracks, and the number of bits occupied by each track is 3, 4, that is, the track lengths are 8, 16, respectively. The algebraic codebook full search algorithm adopts a multiple loop nesting method, each subframe needs 65536 loops, and the computational complexity of the algebraic codebook search part is 85.2MOPS (Million Operations Per Second ). The invention avoids multiple nested loops of full search, and only 512 loops are needed in each subframe when the algebraic codebook is searched. Because the interior of the loop body is completely the same as that of the full search method, the computation complexity of the algebraic digital codebook searching part is only 0.67MOPS which is about 1/128 of that of the full search method after the method is used.
The following average distortion measure D was used to examine the impact of the invention on coding quality:
Figure A20061006200700181
wherein, M is the number of the sub-frames contained in the test statement. e.g. of the type i Is given byAnd (3) discharging:
Figure A20061006200700182
in the formula, the superscript (i) represents the input parameter of the ith subframe; n is the subframe length; x is the number of 0 (n) subtracting the residual signal of the weighted synthesis filter zero input response for perceptually weighted speech; x is the number of u′ (n)、t j′ (n) zero state responses of the best vector output by the adaptive codebook and the best random codevector output by the algebraic codebook through a perceptual weighted synthesis filter, respectively;
Figure A20061006200700183
an adaptive codebook gain quantized for the ith subframe;
Figure A20061006200700184
and the quantized fixed codebook gain of the ith subframe.
The average distortion statistics after testing a large number of statements are shown in the table below. As can be seen from the table, the average distortion of the invention is improved by 0.4% only compared with the full search, and can be ignored. From this point of view, the present invention does not cause any degradation in the quality of speech coding.
Algorithm Algebraic codebook full search The invention
Mean distortion D 115.6 116.1
In order to further test the coding performance of the invention, the full search algorithm and the invention are objectively evaluated for voice Quality, and Evaluation software is P.862 of ITU-T to objectively evaluate the Perceptual Evaluation of Speech Quality (PESQ). Chinese speech consists of 16 sentences, 8 of which are from male speech and 8 of which are from female speech. The test results are shown in the table below. As can be seen from the table, the Mean Opinion Score (MOS) of the present invention is only 0.005 lower than the general search algorithm, which is not subjectively perceptible, consistent with the experimental results presented in the table above.
Female voice MOS divide Men's voice MOS divides
Full search algorithm The invention Full search algorithm The invention
F01 3.121 3.056 M01 2.958 2.927
F02 3.166 3.220 M02 2.903 3.022
F03 2.951 2.905 M03 3.222 3.284
F04 3.353 3.356 M04 2.948 3.048
F05 3.265 3.202 M05 3.389 3.306
F06 3.055 3.056 M06 3.290 3.287
F07 3.057 2.839 M07 3.141 3.298
F08 2.866 2.874 M08 3.227 3.146
Average female voice 3.10425 3.0635 Average male voice 3.13475 3.16475
Total mean of Full search algorithm The invention
3.1195 3.114125
It can be seen from the above detection results that the present invention hardly affects the quality of the encoded speech, and the complexity is only 1/128 of the full search algorithm.
The above description is intended to be illustrative of the preferred embodiment of the present invention and should not be taken as limiting the invention, but rather, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

Claims (18)

1. A method for searching an algebraic codebook, wherein each algebraic codebook comprises a plurality of subcodebooks, each subcodebook comprises a plurality of tracks, and each track stores locations where one or more non-zero pulses may occur, the method comprising:
A. decomposing a plurality of tracks contained in each sub-codebook into a plurality of paths, wherein the plurality of paths form a path group to be searched, and each path contains a plurality of possible positions where non-zero pulses can appear;
B. determining the optimal index of each path in the path group to be searched according to the track position of the nonzero pulse in the path;
C. and selecting a fixed path and at least one dynamic path from the path group to be searched, performing combined cyclic search on all the fixed paths and the dynamic paths to obtain an optimal algebraic subcodebook index corresponding to the dynamic path, and converting the searched optimal algebraic subcodebook index of the fixed path and the searched optimal algebraic subcodebook index corresponding to the dynamic path into a corresponding track index.
2. The algebraic codebook searching method of claim 1, wherein the most optimal index of the path is a path index satisfying the following condition:
Figure A2006100620070002C1
wherein, I x For the optimal index of path x, p (j) is the track position corresponding to index j, and r '(n) is the residual target vector x' 0 Convolution with perceptually weighted synthesis filter impulse response h (n), s x Is the non-zero pulse amplitude for path x, either +1 or-1.
3. The algebraic codebook searching method of claim 1, wherein the optimal algebraic subcodebook index for the dynamic path is an algebraic codebook index i that maximizes the following equation e in the cyclic search of all fixed and dynamic path combinations m
Figure A2006100620070002C2
Wherein r' (i), i =0,1, \ 8230;, N c -1 is a residual target vector x' 0 Convolution with perceptually weighted synthesis filter impulse response h (N) _ (i, j), i, j =0,1, \8230;, N c -1 is a covariance matrix of h (N), N c Is the number of non-zero pulses in the random excitation vector, m i For positions in the dynamic path where non-zero pulses may occur, s i Is the pulse amplitude.
4. A search method of an algebraic codebook according to claim 1, 2 or 3, wherein in decomposing the tracks, one track is decomposed into one or two paths according to the track length.
5. An algebraic codebook searching method as defined in claim 4, wherein when a track is decomposed into two paths, the even-term position in the track is one path and the odd-term position is one path.
6. The algebraic codebook searching method of claim 5, wherein when converting the searched optimal index of the fixed path and the optimal algebraic subcodebook index corresponding to the dynamic path into corresponding track indexes, when a track contains only 1 path, the track index is the track index;
when a track has 2 paths, the path index is converted into a track index by the following equation:
Figure A2006100620070003C1
wherein, c i Is an index of track I, I i The a-path is the even term of the track and the b-path is the odd term of the track, which is the index of path i.
7. An apparatus for searching an algebraic codebook, wherein each algebraic codebook comprises a plurality of subcodebooks, each subcodebook comprises a plurality of tracks, and each track stores locations where one or more non-zero pulses may occur, the apparatus comprising:
the track path decomposition module is used for decomposing a plurality of tracks contained in each sub-codebook into a plurality of paths, the paths form a path group to be searched, and each path contains a plurality of possible positions of non-zero pulses;
the optimal index searching module is used for determining the optimal index of each path in the path group to be searched according to the track position of the nonzero pulse in the path; and
and the path cyclic search module is used for selecting a fixed path and at least one dynamic path from the path group to be searched, performing combined cyclic search on all the fixed paths and the dynamic paths to obtain the optimal algebraic subcodebook index corresponding to the dynamic path, and converting the searched optimal index of the fixed path and the searched optimal algebraic subcodebook index corresponding to the dynamic path into corresponding track indexes.
8. The algebraic codebook searching device of claim 7, wherein the most optimal index of the path is a path index satisfying the following condition:
Figure A2006100620070004C1
wherein, I x For the optimal index of path x, p (j) is the track position corresponding to index j, and r '(n) is residual target vector x' 0 Convolution with perceptually weighted synthesis filter impulse response h (n), s x Is the non-zero pulse amplitude for path x, either +1 or-1.
9. Of an algebraic codebook according to claim 7The searching apparatus is characterized in that the optimal algebraic subcodebook index corresponding to the dynamic path is an algebraic codebook index i which maximizes the following expression e in a cyclic search process of all combinations of fixed paths and dynamic paths m
Figure A2006100620070004C2
Wherein r' (i), i =0,1, \8230, N c -1 is a residual target vector x' 0 Convolution with perceptually weighted synthesis filter impulse response h (N) _ (i, j), i, j =0,1, \8230;, N c -1 is a covariance matrix of h (N), N c Is the number of non-zero pulses in the random excitation vector, m i For positions in the dynamic path where non-zero pulses may occur, s i Is the pulse amplitude.
10. The apparatus for searching an algebraic codebook according to claim 7, 8 or 9, wherein in decomposing the tracks, one track is decomposed into one or two paths according to the track length.
11. The apparatus of algebraic codebook searching device of claim 10, wherein when a track is decomposed into two paths, the even term position in the track is a path and the odd term position is a path.
12. The algebraic codebook searching device of claim 11, wherein when converting the searched optimal algebraic subcodebook indices for the fixed path and the dynamic path into corresponding track indices, the track index is a track index when one track contains only 1 path;
when a track has 2 paths, the path index is converted into a track index by the following equation:
Figure A2006100620070005C1
wherein, c i Being an index of track I, I i The a-path is the even term of the track and the b-path is the odd term of the track, which is the index of path i.
13. A speech encoder comprising:
codebook searching means for searching a pulse position in a codebook at which a weighted error between a residual target vector and a synthesized speech signal is minimized;
a first adder for adding pulses corresponding to the positions of the nonzero pulses searched by the codebook searching device and outputting a random excitation vector;
a perceptual weighting synthesis filter for performing perceptual weighting synthesis processing on the random excitation vector and outputting a perceptually weighted synthesized speech signal; and
a second adder for subtracting the residual target vector from the synthesized speech signal and outputting a weighted error signal;
each algebraic codebook comprises a plurality of subcodebooks, each subcodebook comprises a plurality of tracks, and each track stores the position where one or more nonzero pulses possibly appear;
characterized in that, the codebook searching device comprises:
the track path decomposition module is used for decomposing a plurality of tracks contained in each sub-codebook into a plurality of paths, the plurality of paths form a path group to be searched, and each path contains a plurality of possible positions where nonzero pulses can appear;
the optimal index searching module is used for determining the optimal index of each path in the path group to be searched according to the track position of the nonzero pulse in the path; and
and the path cyclic search module is used for selecting a fixed path and at least one dynamic path from the path group to be searched, performing combined cyclic search on all the fixed paths and the dynamic paths to obtain the optimal algebraic subcodebook index corresponding to the dynamic path, and converting the searched optimal index of the fixed path and the optimal subcodebook index corresponding to the dynamic path into corresponding track indexes.
14. The speech coder of claim 13, wherein the optimal index for the path is a path index that satisfies the following condition:
wherein, I x For the optimal index of path x, p (j) is the track position corresponding to index j, and r '(n) is residual target vector x' 0 Convolution with perceptually weighted synthesis filter impulse response h (n), s x Is the non-zero pulse amplitude for path x, either +1 or-1.
15. The speech coder of claim 13, wherein the optimal algebraic subcodebook index for the dynamic path is an algebraic codebook index i that maximizes equation e as follows during the round-robin search over all fixed and dynamic path combinations m
Figure A2006100620070006C2
Wherein r' (i), i =0,1, \ 8230;, N c -1 is a residual target vector x' 0 Convolution with perceptually weighted synthesis filter impulse response h (N) _ (i, j), i, j =0,1, \ 8230N c -1 is a covariance matrix of h (N), N c Is the number of non-zero pulses in the random excitation vector, m i For positions in the dynamic path where non-zero pulses may occur, s i Is the pulse amplitude.
16. The speech coder of claim 13, 14 or 15, wherein in decomposing the tracks, a track is decomposed into one or two paths according to the track length.
17. The speech coder of claim 16, wherein when a track is decomposed into two paths, the even term position in the track is a path and the odd term position in the track is a path.
18. The speech coder of claim 17, wherein when the searched optimal index of the fixed path and the optimal algebraic subcodebook index corresponding to the dynamic path are converted into corresponding track indexes, when one track has only 1 path, the path index is a track index;
when a track has 2 paths, the path index is converted into a track index by the following equation:
Figure A2006100620070006C3
wherein, c i Is an index of track I, I i The a-path is the even term of the track and the b-path is the odd term of the track, which is the index of path i.
CNA2006100620071A 2006-08-04 2006-08-04 Search method, device and speech coder for algebraic codebook Pending CN101118748A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA2006100620071A CN101118748A (en) 2006-08-04 2006-08-04 Search method, device and speech coder for algebraic codebook

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA2006100620071A CN101118748A (en) 2006-08-04 2006-08-04 Search method, device and speech coder for algebraic codebook

Publications (1)

Publication Number Publication Date
CN101118748A true CN101118748A (en) 2008-02-06

Family

ID=39054826

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2006100620071A Pending CN101118748A (en) 2006-08-04 2006-08-04 Search method, device and speech coder for algebraic codebook

Country Status (1)

Country Link
CN (1) CN101118748A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009006819A1 (en) * 2007-07-11 2009-01-15 Huawei Technologies Co., Ltd. Fixed codebook search method, searcher and computer readable medium
CN103456309A (en) * 2012-05-31 2013-12-18 展讯通信(上海)有限公司 Voice coder and algebraic code list searching method and device thereof
CN110932739A (en) * 2019-12-20 2020-03-27 成都大学 System and method for reducing error interference of communication and radar excitation signals
CN116052700A (en) * 2022-07-29 2023-05-02 荣耀终端有限公司 Voice coding and decoding method, and related device and system

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009006819A1 (en) * 2007-07-11 2009-01-15 Huawei Technologies Co., Ltd. Fixed codebook search method, searcher and computer readable medium
US8515743B2 (en) 2007-07-11 2013-08-20 Huawei Technologies Co., Ltd Method and apparatus for searching fixed codebook
CN103456309A (en) * 2012-05-31 2013-12-18 展讯通信(上海)有限公司 Voice coder and algebraic code list searching method and device thereof
CN103456309B (en) * 2012-05-31 2016-04-20 展讯通信(上海)有限公司 Speech coder and algebraically code table searching method thereof and device
CN110932739A (en) * 2019-12-20 2020-03-27 成都大学 System and method for reducing error interference of communication and radar excitation signals
CN110932739B (en) * 2019-12-20 2021-05-18 成都大学 System and method for reducing error interference of communication and radar excitation signals
CN116052700A (en) * 2022-07-29 2023-05-02 荣耀终端有限公司 Voice coding and decoding method, and related device and system
CN116052700B (en) * 2022-07-29 2023-09-29 荣耀终端有限公司 Voice coding and decoding method, and related device and system
WO2024021747A1 (en) * 2022-07-29 2024-02-01 荣耀终端有限公司 Sound coding method, sound decoding method, and related apparatuses and system

Similar Documents

Publication Publication Date Title
CA2684379C (en) A speech coder using an orthogonal search and an orthogonal search method
McCree et al. A 2.4 kbit/s MELP coder candidate for the new US Federal Standard
CA2202825C (en) Speech coder
US6122608A (en) Method for switched-predictive quantization
KR100304682B1 (en) Fast Excitation Coding for Speech Coders
CN1121683C (en) Speech coding
JP3196595B2 (en) Audio coding device
EP0718822A2 (en) A low rate multi-mode CELP CODEC that uses backward prediction
US20050114123A1 (en) Speech processing system and method
CN1192357C (en) Adaptive criterion for speech coding
CN101118748A (en) Search method, device and speech coder for algebraic codebook
Dusan et al. Speech compression by polynomial approximation
KR101369064B1 (en) Audio encoding device and audio encoding method
Yeh et al. An efficient complexity reduction algorithm for G. 729 speech codec
CN100367347C (en) Sound encoder and sound decoder
Tanaka et al. Low-bit-rate speech coding using a two-dimensional transform of residual signals and waveform interpolation
JP3185748B2 (en) Signal encoding device
JP3579276B2 (en) Audio encoding / decoding method
JP3153075B2 (en) Audio coding device
Xydeas et al. Theory and Real Time Implementation of a CELP Coder at 4.8 and 6.0 kbits/second Using Ternary Code Excitation
Kemp et al. LPC parameter quantization at 600, 800 and 1200 bits per second
JPH07168596A (en) Voice recognizing device
JP3984048B2 (en) Speech / acoustic signal encoding method and electronic apparatus
CN1159044A (en) Voice coder
Kim et al. On a Reduction of Pitch Searching Time by Preprocessing in the CELP Vocoder

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Open date: 20080206