TW200910329A - Stochastic codebook search algorithm with complexity scalability for speech coders - Google Patents

Stochastic codebook search algorithm with complexity scalability for speech coders Download PDF

Info

Publication number
TW200910329A
TW200910329A TW96132324A TW96132324A TW200910329A TW 200910329 A TW200910329 A TW 200910329A TW 96132324 A TW96132324 A TW 96132324A TW 96132324 A TW96132324 A TW 96132324A TW 200910329 A TW200910329 A TW 200910329A
Authority
TW
Taiwan
Prior art keywords
pulse
search
value
normalized similarity
speech
Prior art date
Application number
TW96132324A
Other languages
Chinese (zh)
Inventor
fu-kun Chen
Bo-Kai Su
Original Assignee
Univ Southern Taiwan Tech
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Univ Southern Taiwan Tech filed Critical Univ Southern Taiwan Tech
Priority to TW96132324A priority Critical patent/TW200910329A/en
Publication of TW200910329A publication Critical patent/TW200910329A/en

Links

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

In this invention, a stochastic codebook search algorithm with complexity scalability for the algebraic code excited linear prediction speech coder (ACELP is proposed. To find the pulse positions, the proposed approach first evaluates the contribution of each pulse and sorts them in order to decide the order of pulse-searching. Concurrently, using a breadth-control parameter, the proposed approach is able to control the searching breadth in the search tree. Simulation results reveal that the proposed search scheme not only reduces the computational complexity but also preserves the quality of encoded speech.

Description

200910329 九、發明說明: 【發明所屬之技術領域】 本發明係提供一種用於語音編碼器之可控複雜度隨機 脈衝搜尋方法,其主要用於搜尋代數碼激式線性預測語音 編碼器(ACELP)之隨機脈衝位置;尤其是指一種可控制搜尋 複雜度,期能同時能降低語音編碼運算複雜度之隨機脈衝 搜尋方法;而在其整體施行使用上更增語音編碼實用價值 性的脈衝搜尋方法創新設計者。 【先前技術】 按,各種共軛結構線性預測語音編碼器,例如:國際 通4§組織(ITU-T)的G· 723, 1及G. 729、歐洲電信標準(etsi) 的加強型全速率編碼(EFR) ’已廣泛應用於網路通訊、無線 通訊上,而語音編碼的過程是為掌控語音品質與效能的關 鍵所在。 低位元率的語音編碼器中,以共輛結構線性預測語音 編碼器的語音品質與效能最為人所接受,其包含感官加權 ;慮波器(Perceptual Weighting Fi Iter)、線性預測渡波器 (Linear Prediction Filter)、適應性碼薄編碼(Adaptive Codebook)、隨機碼薄編碼(stochastic Codebook)等主要 架構。而隨機碼薄的編碼是此等語音編碼器架構的重要一 環,舉凡線性預測渡波器無法預測、適應性碼薄無法估計 或無聲子音的部分,均需隨機碼薄的編石馬來做補償。在眾 多隨機碼薄的編碼架構中,代數碼激式線性預測語音編碼 器(Algebraic-Code-Excited Linear Prediction)被公認 為在隨機碼溥編碼方面有高品質表現。所以,代數碼激式 線性預測語音編碼器(ACELP)已經被各國際電信組織所採 200910329 納,並為目前語音壓縮編碼之主流。 音編有之共軛架構代數瑪激式線性預測語 激式線性預測語音編瑪器巾, ,之代數馬200910329 IX. Description of the Invention: [Technical Field] The present invention provides a controllable complexity random pulse search method for a speech coder, which is mainly used for searching for an algebraic digital linear predictive speech coder (ACELP) Random pulse position; especially refers to a random pulse search method that can control the search complexity, and can reduce the complexity of speech coding operation at the same time; and the pulse search method innovation that increases the practical value of speech coding in its overall implementation. Designer. [Prior Art] Linear predictive speech coder according to various conjugate structures, for example, the enhanced full rate of G. 723, 1 and G. 729 of the International Telecommunications Organization (ITU-T) and the European Telecommunications Standard (etsi) Encoding (EFR) has been widely used in network communication and wireless communication, and the process of speech coding is the key to control voice quality and performance. Among the speech coder with low bit rate, the speech quality and performance of the speech encoder are linearly predicted by the common vehicle structure, which includes sensory weighting; Perceptual Weighting Fi Iter, Linear Prediction Waver (Linear Prediction) Main architectures such as Filter), Adaptive Codebook, and Stochastic Codebook. The encoding of the random codebook is an important part of the structure of the speech coder. For those parts of the linear prediction ferrometer that cannot be predicted, the adaptive codebook cannot be estimated or the unvoiced consonant, the random codebook is required to compensate. In the coding architecture of many random codebooks, Algebraic-Code-Excited Linear Prediction is recognized as having high quality performance in random code encoding. Therefore, the Algebraic Digital Linear Prediction Speech Encoder (ACELP) has been adopted by international telecommunication organizations in 200910329 and is the mainstream of current speech compression coding. Acoustic coded conjugate architecture algebraic markov linear predictive stimuli linear linear predictive speech coder, algebraic horse

(1)、線性預測嗆沽ιΛ 1 j ^ ^ & H ’綠預4/慮波器分析編碼(2)、適應性碼薄撞尊编牌 ^隨機科财料⑷位元㈣區編^ 等主,編碼處理流程;其巾’適應性碼馳尋編碼⑶與隨 機,薄搜尋編碼⑷二者構成此類碼激讀性預測語音編 碼器的共姉構。各種隨機贿搜尋編碼架構研究因應而 生,在位兀率與語音品質等種種考量下,將音框位置分執 (Track)來建構隨機碼薄,以搜尋脈衝安置的最佳位置,再 搭配該位置脈衝的極性〔正負號〕的編碼方式為最佳,稱 為代數碼激式線性預測語音編碼器 (Algebraic-Code-Excited Linear Prediction)已為各國 際電信標準所採用,簡稱ACELP。 代數碼激式線性預測語音編碼器之隨機碼薄的音框位 置分執方式;例如表一為國際通信組織ITU-T G. 729所使 用的隨機碼薄,其他標準或研究文獻均類似此種交錯方式 (Interleave)來建構隨機碼薄的位置。 軌 極性 位置 m 士 1 0,5,10,15,20, 25,30,35 m\ ±1 1,6, 11,16, 21,26, 31,36 ai2 ±1 2, 7, 12, 17, 22, 27, 32, 37 茁3 土 1 3, 8, 13, 18, 23, 28, 33, 38 4, 9, 14, 19, 24, 29, 34, 39 表一、ITU-T G. 729所使用的隨機碼薄 200910329 此類代數碼激式線性預測語音編竭器之隨機碼簿搜尋 =制為:以適應性碼薄搜尋編碼⑶無法估計的殘餘訊號為 輸入’稱為隨機碼薄搜尋編碼⑷的標的訊號(加的 S聊1),另外配合適應性贿搜尋糾⑶的結果,以及 ,性預測濾波器分析編碼(2)的編碼結果,在隨機碼薄中搜 哥最佳脈衝女置位置以及該脈衝極性,並搜尋該隨機碼薄 編碼結果所須的增Stochastic CcxieboQk Gain)。-般 而言,藉由線性預測濾波器(LPC Filter)的自相關函數 (Auto-corre 1 at i on Funct i on)以及標的訊號與線性預測 濾波器的相關函數(C〇rre 1 at ion Funct丨〇n)可方便的求出 隨機碼簿編碼的脈衝位置結果。 代數碼激式線性預測語音編碼器(ACELP )之隨機碼薄 搜哥編碼原則為;尋找最佳脈衝位置、極性和其增益值(G) 使其合成語音與標的訊號(r)的均方誤差為最小。即要 求的準則為: Εξ = r-GHv^ 2 上式中’ νξ為代數瑪指標向量(Algebraic Code Vector)其 内容表示隨機碼薄的編碼指標(Ιη(^χ)ξ。在方程式中,矩 陣H=hTh表示下三角拓樸旋積矩陣(l〇w Triangular Toepliz Convolution Matrix),而 h[n]為線性預測渡波 器亦即語音口腔模式響應(Impulse Response of Vocal Tract Model)。參考ITU-T G. 729標準,經過合理的推導 後’所欲最佳的編碼向量(Optimum Codevector)應為達成 最大之正規化相似度(Normalized Similarity)者: 200910329 _ (rrHv,)2 ξ vjHrHy^ εξ ν^Φν^ 其中,d=HTr為標的訊號r[n]與線性預測濾波器h[n]的相 關函數(Correlation Function)且Φ=ΗΤΗ為線性預測爐、波 器係數的共變異矩陣(Covariance Matrix of the impulse response)亦即h[n]的自相關函數。欲使前述準有最大 值,可以巢狀搜尋(Nest-Loop Search)或稱全搜尋(Full Search)方式來求得最佳脈衝位置組合;巢狀搜尋乃是對每 一位置搜尋迴路逐次更換一新的位置,如此一來位置組合 的總數甚為龐大。(1), linear prediction 呛沽ιΛ 1 j ^ ^ & H 'green pre 4 / wave filter analysis coding (2), adaptive code thin hit the respected card ^ random science materials (4) bit (four) area edit ^ The main processing logic process; its towel 'adaptive code chaos coding (3) and random, thin search code (4) constitute the co-destruction of such code-predictive speech coder. Various random bribery search coding architecture studies have emerged. Under the various considerations of bit rate and speech quality, the position of the sound box is divided to construct a random codebook to search for the optimal position of the pulse placement. The polarity of the position pulse [signal] is optimal, and the Algebraic-Code-Excited Linear Prediction (Algebraic-Code-Excited Linear Prediction) has been adopted by various international telecommunication standards, referred to as ACELP. The position of the sound box position of the random codebook of the digitally-excited linear predictive speech coder; for example, Table 1 is the random codebook used by the international communication organization ITU-T G.729, and other standards or research documents are similar. Interleave is used to construct the location of the random codebook. Rail polarity position m 0 1 , 5, 10, 15, 20, 25, 30, 35 m \ ±1 1,6, 11,16, 21,26, 31,36 ai2 ±1 2, 7, 12, 17 , 22, 27, 32, 37 茁3 土1 3, 8, 13, 18, 23, 28, 33, 38 4, 9, 14, 19, 24, 29, 34, 39 Table 1, ITU-T G. Random codebook used by 729 200910329 Random codebook search for such algebraic digitally-predicted speech-predictor = system: search code with adaptive codebook (3) residual signal that cannot be estimated as input 'called random codebook Search for the code (4) of the target signal (plus S chat 1), in addition to the results of adaptive bribery search (3), and the prediction result of the predictive filter analysis code (2), the best pulse female in the random code book Set the position and the polarity of the pulse, and search for the Stochastic CcxieboQk Gain) required for the random code code. In general, the autocorrelation function (Auto-corre 1 at i on Funct i on) of the linear prediction filter (LPC Filter) and the correlation function between the target signal and the linear prediction filter (C〇rre 1 at ion Funct)丨〇n) It is convenient to find the pulse position result of the random codebook encoding. The code principle of the random code thin search speech encoder (ACELP) is to find the optimal pulse position, polarity and its gain value (G) to make the mean square error of the synthesized speech and the target signal (r). For the smallest. The required criterion is: Εξ = r-GHv^ 2 In the above formula, νξ is the Algebraic Code Vector, and its content represents the coding index of the random codebook (Ιη(^χ)ξ. In the equation, the matrix H=hTh represents the lower triangular topology matrix (l〇w Triangular Toepliz Convolution Matrix), and h[n] is the linear prediction waver, ie Impulse Response of Vocal Tract Model. Refer to ITU-T G. 729 standard, after reasonable derivation, the optimal Optimum Codevector should be the largest normalized similarity: 200910329 _ (rrHv,)2 ξ vjHrHy^ εξ ν^ Φν^ where d=HTr is the correlation function of the target signal r[n] and the linear prediction filter h[n] and Φ=ΗΤΗ is the covariance matrix of the linear prediction furnace and the wave coefficient (Covariance Matrix of The impulse response) is the autocorrelation function of h[n]. To maximize the aforementioned quasi-maximum value, Nest-Loop Search or Full Search can be used to find the optimal pulse position combination. Nest But to find a position for each search circuit successive replacement of a new location, location combined total result of very large.

在隨機碼薄編喝過程中計算量簡化的解決方案中,現 行計有深先搜尋法(Depth-First Tree Search, 1997 年 USIn the solution of simplifying the calculation of the random codebook, the current method has a deep search method (Depth-First Tree Search, 1997 US

5701392)、對焦搜尋法(F〇cused Search, 2004 年 US 0098254 A1)與最新提出的脈衝替換搜尋法(pulse5701392), focus search method (F〇cused Search, 2004 US 0098254 A1) and the newly proposed pulse replacement search method (pulse

Replacement Search, 2004 年 US 0193410 A1)。 一緣是,發明人有鑑於上述習知脈衝搜尋法計算量仍 兩’乃秉持多年該相關行業之豐富設計開發及實際製作經 驗針對現有之搜尋法則予以研究改良,遂有本發明「用 於語音編碼ϋ之可控複雜度隨機脈衝搜尋方法」產生,以 期達降低搜尋運算複雜度之目的。 【發明内容】 方法本,Γ「用於語音編碼11之可控複雜度隨機脈衝搜尋 衝位置搜尋要向:對語音壓縮編碼中的隨機脈 度的貝獻並加以排序,以決定脈衝搜尋之優先順序;同時 200910329 在搜尋法中加入一個控制參 制搜尋樹的廣度與深度。在其整==制參數可以控 =:度’且_編碼 f取每軌仏最大之^’ 始向量之Π:招π 4。/、产η 且句例始句里户,汁算初 參數c/大小;步驟Irr值,J'紀錄於^,設定廣度控制 0仏’依仏..么大小排序為夕⑻斤广、 者代表所去掉的脈衝貢,=·0)’所求值最大 魅县丄& 衝貝獻度最小,需優先更換;S紀錄排 财徐2 紐;脈難尋制由_s值可完成所有 、衝,步驟3、保留其他脈衝位置依s所對應的脈衝, 仃早軌雜搜尋’計算脈衝搜尋㈣佳眺值與先前紀 錄之Q比較;若脈衝替換後正規化相似度么值增加,則進 步由C/判斷是否進行廣度搜尋:判斷1 :若Q 則將 =衝替換後麟之新向料與所計算H更新初始向 里Ρ與β值,重设,代回Β計算此脈衝對正規化相 似度的貢^ ;判斷2 :若C/#則紀錄脈㈣換後所得之新 向I A於卜更新β值為C所計算出之a,設定^ =1 ; 將S-1選擇其他脈衝搜尋,代自c進行此轨脈衝搜尋;若 替換後無法提升正規化相似度0值,則將s—丨:選擇其他 脈衝搜尋,代回步驟3進行脈衝搜尋程序;步驟4、當s=:〇 即所有脈衝搜尋結果皆無法提升正規化相似度錄,則檢 查廣度搜尋最佳解之紀錄q :判斷丨:若^⑼則將所紀 200910329 ’並設定= 〇,代回步 列斷2 :若= 0則結束 錄之搜尋結果p更新初始向量尸 驟2計算麵之正規化相似度; 搜尋程序。 【實施方式】 功效有Sdt達揭目的’所採用之技術手段及其 下,俾使:發明\可打貫施例’並配合圖示及圖號詳述如 步驟1—由綠廣度搜尋最佳解,令。,取每轨" 值最大之位置為初始向量尸;計算初始向量之正 步驟 規化相似度仏,齡值,且紀錄於 β,設定廣度控制參數C,大小; 分別去掉初始向量户内的各單根脈衝後,再計算 向里中其餘脈衝之正規化相似度 〜c【,〜(其中’Replacement Search, 2004 US 0193410 A1). On the one hand, the inventors have made research and improvement on the existing search rules in view of the above-mentioned conventional pulse search method, which is still based on the rich design and development experience of the relevant industries for many years. The coded chirp controllable complexity random pulse search method is generated in order to reduce the complexity of the search operation. [Description of the Invention] Method, Γ "Controllable Complexity for Voice Coding 11 Random Search for Location Search: To sort and rank the random pulse in speech compression coding to determine the priority of pulse search Sequence; at the same time, 200910329 adds a control parameter to the breadth and depth of the search tree in the search method. In its whole == system parameter can be controlled =: degree 'and _ code f takes the maximum ^' start vector of each track:招 π 4, /, production η and sentence sentence in the household, juice calculation initial parameter c / size; step Irr value, J' record in ^, set breadth control 0 仏 'dependence.. size sorted as eve (8)斤广, the representative of the pulse tribute removed, =·0) 'The most value of the charm of the county 丄 & rushing shells the least, need to be replaced first; S record treasury Xu 2 New; pulse difficult to find by _s The value can be completed, all, rushing, step 3, retaining other pulse positions according to the pulse corresponding to s, 仃 early track miscellaneous search 'calculated pulse search (four) good value compared with the previous record Q; if the pulse is replaced after normalization similarity? If the value increases, then the progress is determined by C/ whether to perform breadth search: judgment 1: Q will replace the new material and the calculated H update initial inward and β value, reset, and calculate the similarity of the pulse to the normalized similarity; judge 2: if C/# Then the new record of the record pulse (4) is changed to IA, and the β value is calculated as C, and ^ =1 is set; the other pulse search is selected by S-1, and the track search is performed by c; if it is replaced If the normalized similarity 0 value cannot be raised, then s_丨: select another pulse search, and go back to step 3 to perform the pulse search procedure; step 4, when s=:〇, all pulse search results cannot improve the normalized similarity record. , check the breadth search for the best solution record q: Judgment 丨: If ^ (9) then will be the 200910329 'and set = 〇, on behalf of the return step break 2: If = 0 then end the search results p update the initial vector corpse Step 2: Calculate the normalized similarity of the surface; Search for the program. [Embodiment] The technical means of using Sdt to reveal the purpose and the following, and the following: invented \ can be used in conjunction with the illustration and diagram The number is detailed as step 1 - searching for the best solution by green breadth, ordering, taking the value of each track " The location of the large vector is the initial vector corpse; the positive step of calculating the initial vector is to normalize the similarity 仏, the age value, and is recorded in β, setting the breadth control parameter C, size; respectively, after removing the individual pulses in the initial vector household, respectively Calculate the normalized similarity of the remaining pulses in the inward ~c[,~(where'

ΙΦΡ j^P 且’()得4>..么,依ϋ大小排序為 ☆ μ »· · 4’所求值最大者代表所去掉的脈衝 貝獻度最小,需優先更換;S紀錄排序後最大值 之位址;脈衡搜尋後藉由判斷S值可完成所有脈 衝搜尋; 200910329 步驟保留其他脈衝位置依s所對應的脈衝,進行單執 脈衝搜尋,計算脈衝搜尋後最佳的么值與先前紀 錄之2比較;若脈衝替換後正規化相似度么值增 力口,則進一步由〇判斷是否進行廣度搜尋: 列斷1:若=0則將脈衝替換後所得之新向量A 與所计异出之仏’更新初始向量尸與g 值;重設=0,代回步驟2計算此脈衝 對正規化相似度的貢獻; 列斷2:若<^#〇則紀錄脈衝替換後所得之新向量 巧於f,更新2值為巧所計算出之込,設 定匸^ =1 ;將S-1選擇其他脈衝搜尋’代 回步驟3進行此軌脈衝搜尋; 各替換後無法提升正規化相似度g值,則將S一1 ; 選擇其他脈衝搜尋,代回步驟3進行脈衝搜尋程 序; 步驟4〜當s=〇即所有脈衝搜尋結果皆無法提升正規化相 似度β值,則檢查廣度搜尋最佳解之紀錄 列斷1:若CeB(i*0則將所紀錄之搜尋結果f更新 初始向量尸’並設定= 〇,代回步驟 2計算脈衝之正規化相似度; 列斷2 :若=0則結束搜尋程序。 本發明於實際使用上,利用2〇個音檔在g. 729語音壓 縮編碼規範下進行性能比較實驗,同時紀錄下每個編碼子 音框之平均脈衝替換迴圈數。實驗顯示,對於本發明所產 生的語音,一般人主觀聽覺並無法分辨語音品質之不同。 因此’使用與原始未壓縮編碼語音訊噪比(s N R )和區段訊噪 11 200910329 比(Segmental SNR)作為客觀語音品質衡量方式。使用語音 品質衰減比例與衰減變異量來衡量性能表現,則如表二所 示,控制函數Q =5與C7 = 7時語音品質可與(;.729所採用 的對焦.搜尋法(Focused Search)與深先搜尋法 (Depth-First Tree Search)相匹配,且搜尋複雜度更低於 深先搜尋法。 SNR⑻ SNR Std (%) SegSNR⑻ SegSNR Std (%) No. of Loop Full Search — 一 — — 8192 Focused Search -0.3555 0.9563 0.4160 1.0584 1440 Depth First -0.4719 0. 7860 -0. 0755 0. 6223 320 c7 =0 ~1.0355 0. 9621 -0. 5137 0.8282 78.52 Cf =1 -0.6684 0.8427 -0.2825 0.8363 103.29 =2 -0. 4564 1.1675 -0. 2306 0. 8072 105.31 =3 -0.5066 0.9936 -0.0114 0.8764 112.03 Cf =4 -0.3332 1.3557 -0.0001 0.7018 120.63 Cf =5 -0. 3666 1.2011 0.0531 0.8274 121.89 Cj- =6 -〇.5520 0.7783 -0.1172 0.5826 124. 03 Cf =7 -〇. 5266 1.9065 0.0122 0.7794 125. 79 = 8 -〇. 6520 1.7000 -0. 0033 0.6370 126.11 Cf =9 -〇.3332 0.9557 -0. 0382 0.6995 126.53 Cf =10 -〇. 2466 0.9668 -0. 0879 0.6128 126.71 Cf =11 -〇. 2309 0.9250 -0.0841 0.6059 126.78 Cf =12 -〇. 2857 0.9415 -0. 0935 0.5930 126. 84 ¢7, = 13 -〇.2865 0.9427 -0. 0876 ^0.5963 126.83 表二、可控複雜度隨機脈衝搜尋法語音品質比較 12 200910329 表,結果以曲線緖製語音品質衰減❹m 哀減丈異董,明一併參閱第二圖本發明在不 =雜尋法之SNR語音品質衰減比較= 搜尋控财數下可純雜度隨機脈衝 ί 音品f衰減比較曲線圖所示,可發現可 控複雜度關脈職尋演算法纽變㈣參數&,對於所 獲得的編碼語音品質並非能得到線性的結果,妯 碼處理方齡相互雜所致,但藉由改㈣鮮&= 編碼語音品質而言仍有增加的趨勢。 / 藉由以上所述,本發明之搜尋法則組成與使用實施說 ,可=,本發明與現有搜尋法則相較之下,本發明係先計 ^目前向量巾每_根脈衝對正規化油度的貢獻並加以排 序,以決定脈衝搜尋之優先順序;同時在搜尋法中加入一 個控制參數,藉由調餘制參數可以控制搜尋樹的廣度; =G· 729為實驗平台,其編碼語音品質可與對焦搜尋法及 深=搜尋法相匹配,且搜尋複雜度更低,同時能控制調整 搜尋複雜度,其在整體施行使用上更具實用功效價值。 以上所述之各實施例的揭示均係為利於說明本發明之 技術手段,並非限制本發明之架構組成,故舉凡數值的變 更或等效結構的簡單替換仍屬本發明之設計範疇。 綜上所述,本發明實施例確能達到所預期之使用功 效,又其所揭露之具體構造,不僅未曾見諸於同類產品中, 亦未曾公開於申請前,誠已完全符合專利法之規定與要 求,爰依法提出發明專利之申請,懇請惠予審查,並賜准 專利’則實感德便。 13 200910329 【圖式簡單說明】 第一圖:本發明之搜尋流程示意圖 第二圖:本發明在不同控制參數下可控複雜度隨機脈 衝搜尋法之SNR語音品質衰減比較曲線圖 第三圖:本發明在不同控制參數下可控複雜度隨機脈 衝搜尋法之SegSNR語音品質衰減比較曲線 圖 第四圖··現有之共軛架構代數碼激式線性預測語音編 碼器編碼處理流程示意圖 【主要元件符號說明】 (1) 感官加權遽波器 (2) 線性預測濾波器分析編碼 (3 ) 適應性碼薄搜尋編碼 (4) 隨機碼簿搜尋編碼 (5) 位元串列組合缓衝區ΙΦΡ j^P and '() get 4>.., sorted by size ☆ μ »· · 4' The highest value is represented by the smallest pulse that is removed, and needs to be replaced first; The address of the maximum value; after the pulse balance search, all the pulse search can be completed by judging the S value; 200910329 The step keeps the pulse corresponding to the other pulse position according to s, performs the single-pulse search, and calculates the optimal value after the pulse search. Comparison of the previous record 2; if the normalized similarity value is increased after the pulse is replaced, it is further determined by 〇 whether to perform the breadth search: Column break 1: If =0, the new vector A obtained after the pulse is replaced After the difference, 'update the initial vector corpse and g value; reset = 0, then return to step 2 to calculate the contribution of this pulse to the normalized similarity; column break 2: if <^#〇 then record the pulse after replacement The new vector is clever to f, the update 2 value is calculated by the trick, and 匸^ =1 is set; the S-1 selects other pulse search 'returns to step 3 for the track search; the normalization is not improved after each replacement Degree g value, then S-1; select other pulse search, Go back to step 3 to perform the pulse search procedure; Step 4~ When s=〇, that is, all the pulse search results cannot improve the normalized similarity β value, check the breadth search for the best solution. The record breaks 1: If CeB (i*0 then The recorded search result f is updated with the initial vector corpse' and set = 〇, and the normalized similarity of the pulse is calculated in step 2; the break is 2: if =0, the search procedure is ended. The present invention is used in actual use, 2 The performance of the audio file is performed under the g. 729 speech compression coding specification, and the average pulse replacement loop number of each coded sub-frame is recorded. Experiments show that the subjective hearing of the speech produced by the present invention is not Distinguish the difference in speech quality. Therefore, 'the original uncompressed encoded speech signal-to-noise ratio (s NR ) and the segmental noise ratio 11 200910329 (Segmental SNR) are used as the objective speech quality measurement method. The speech quality attenuation ratio and the attenuation variation are used. To measure performance, as shown in Table 2, the voice quality can be compared with the control function Q = 5 and C7 = 7; (. Focus used in Focusing Search (Focused Search) The first search method (Depth-First Tree Search) matches, and the search complexity is lower than the deep search method. SNR(8) SNR Std (%) SegSNR(8) SegSNR Std (%) No. of Loop Full Search — one — 8192 Focused Search -0.3555 0.9563 0.4160 1.0584 1440 Depth First -0.4719 0. 7860 -0. 0755 0. 6223 320 c7 =0 ~1.0355 0. 9621 -0. 5137 0.8282 78.52 Cf =1 -0.6684 0.8427 -0.2825 0.8363 103.29 =2 -0. 4564 1.1675 -0. 2306 0. 8072 105.31 =3 -0.5066 0.9936 -0.0114 0.8764 112.03 Cf =4 -0.3332 1.3557 -0.0001 0.7018 120.63 Cf =5 -0. 3666 1.2011 0.0531 0.8274 121.89 Cj- =6 -〇.5520 0.7783 - 0.1172 0.5826 124. 03 Cf =7 -〇. 5266 1.9065 0.0122 0.7794 125. 79 = 8 -〇. 6520 1.7000 -0. 0033 0.6370 126.11 Cf =9 -〇.3332 0.9557 -0. 0382 0.6995 126.53 Cf =10 -〇 2466 0.9668 -0. 0879 0.6128 126.71 Cf =11 -〇. 2309 0.9250 -0.0841 0.6059 126.78 Cf =12 -〇. 2857 0.9415 -0. 0935 0.5930 126. 84 ¢7, = 13 -〇.2865 0.9427 -0. 0876 ^0.5963 126.83 Table 2. Comparison of speech quality of random complexity search with controllable complexity 12 200910329 Table, the results are based on the curve of the speech quality attenuation ❹m 哀 丈 丈 董 , Dong, Ming Yi and see the second picture of the present invention in the SNR quality degradation comparison = not miscellaneous search method = search for wealth The random pulse ί 音 f attenuation comparison graph shows that the controllable complexity can be found in the parameters of the coded speech algorithm. The parameters are not linear, and the weight of the obtained coded speech quality is not linear. Age is mixed, but there is still an increasing trend by changing the (4) fresh &= encoded speech quality. / By the above, the composition and composition of the search rule of the present invention can be said that, in contrast to the existing search rule, the present invention is based on the current vector of the current vector towel. The contribution is sorted to determine the priority of the pulse search; at the same time, a control parameter is added to the search method, and the breadth of the search tree can be controlled by the adjustment parameter; =G· 729 is the experimental platform, and the coded speech quality can be Matching the focus search method and the deep=search method, and the search complexity is lower, and the control search complexity can be controlled, which is more practical and useful in the overall implementation. The disclosures of the various embodiments described above are intended to illustrate the technical means of the present invention and do not limit the architectural composition of the present invention. Therefore, any numerical changes or simple replacement of equivalent structures are still within the scope of the present invention. In summary, the embodiments of the present invention can achieve the expected use efficiency, and the specific structure disclosed therein has not been seen in similar products, nor has it been disclosed before the application, and has completely complied with the provisions of the Patent Law. And the request, 提出 legally filed an application for an invention patent, pleaded for a review, and granted a patent, 'it is really sensible. 13 200910329 [Simple description of the diagram] The first picture: the schematic diagram of the search process of the present invention. The second picture: the SNR speech quality attenuation comparison curve of the controllable complexity random pulse search method under different control parameters. The SegSNR speech quality attenuation comparison curve of the controllable complexity random pulse search method under different control parameters is shown in the fourth figure. · The existing conjugate architecture generation digital excitation linear prediction speech encoder coding processing flow diagram [main component symbol description 】 (1) Sensory weighted chopper (2) Linear prediction filter analysis coding (3) Adaptive codebook search coding (4) Random codebook search coding (5) Bit string combination buffer

Claims (1)

200910329 十、申請專利範圍: 1. 一種用於語音編碼器之可控複雜度隨機脈衝搜尋方 法,主要搜尋流程步驟如下: 步驟1—由紀錄廣度搜尋最佳解,令c^=o,取每 執ί/值最大之位置為初始向量P ;計算初始向 量之正規化相似度么值,且紀錄於2,設定廣 度控制參數CV大小; 步驟2—分別去掉初始向量户内的各單根脈衝後,再計 算向量中其餘脈衝之正規化相似度Α (户)即 Qo-Ώη , 依大小排序為 么W,所求值最大者代表所去掉的 脈衝貢獻度最小,需優先更換;S紀錄排序後 最大值之位址;脈衝搜尋後藉由判斷S值可完 成所有脈衝搜尋; 步驟3—保留其他脈衝位置依S所對應的脈衝,進行單 執脈衝搜尋,計算脈衝搜尋後最佳的A值與先 前紀錄之0比較;若脈衝替換後正規化相似度 A值增加,則進一步由G判斷是否進行廣度 搜尋: 判斷1 :若~ =0則將脈衝替換後所得之新向量 巧與所計算出之&,更新初始向量尸與 2值;重設=0,代回步驟2計算此 脈衝對正規化相似度的貢獻; 判斷2:若則紀錄脈衝'替換後所得之新向 量尽於云,更新!2值為巧所計算出之 込,設定;將S-1選擇其他脈 15 200910329 衝搜尋,代回步驟3進行此執脈衝搜 尋; 若替換後無法提升正規化相似度2值,則將 _ S-1 ;選擇其他脈衝搜尋,代回步驟3進行脈衝 搜尋程序; 步驟4一當S=0即所有脈衝搜尋結果皆無法提升正規化 相似度β值,則檢查廣度搜尋最佳解之紀錄 Cend . 判斷1 :若則將所紀錄之搜尋結果P更 新初始向量尸’並設定= 0 »代回 步驟2計算脈衝之正規化相似度; 判斷2 :若=0則結束脈衝搜尋程序。 2. 如申請專利範圍第1項所述用於語音編碼器之可控複雜 度隨機脈衝搜尋方法,其中,正規化相似度么值求法 為:聲。 3. 如申請專利範圍第1項所述用於語音編碼器之可控複雜 度隨機脈衝搜尋方法,分別去掉初始向量尸内的各單根 脈衝後,所得的正規化相似度A⑻值求法為: ;其中,島、 η 4»=Σ4^·Ι、《為語音編碼器所需之脈衝數量。 i=0 ιφΡ 4. 如申請專利範圍第1項所述用於語音編碼器之可控複雜 16 200910329 度隨機脈衝搜尋方法,其中,控制參數G,於脈衝搜尋 過程中可動態設定其值。 17200910329 X. Patent application scope: 1. A controllable complexity random pulse search method for speech encoders. The main search process steps are as follows: Step 1—Search for the best solution from the breadth of the record, let c^=o, take each The position where the maximum value of the ί/value is the initial vector P; the value of the normalized similarity of the initial vector is calculated, and is recorded in 2, the size of the breadth control parameter CV is set; Step 2 - after removing the individual pulses in the initial vector household respectively Then, calculate the normalized similarity of the remaining pulses in the vector 户 (household), ie Qo-Ώη, sorted by size W, and the largest value represents the smallest contribution of the removed pulse, which needs to be replaced first; The address of the maximum value; after the pulse search, all the pulse search can be completed by judging the S value; Step 3—retain the pulse corresponding to the other pulse positions according to the S, perform the single-pulse search, and calculate the optimal A value after the pulse search. 0 comparison of the previous record; if the normalized similarity A value increases after the pulse is replaced, it is further judged by G whether to perform the breadth search: Judgment 1: If ~ =0, the pulse is replaced The new vector is combined with the calculated & updated initial vector corpse and 2 values; reset = 0, and step 2 is substituted to calculate the contribution of the pulse to the normalized similarity; judgment 2: if the record pulse is replaced The resulting new vector is in the cloud, updated! 2 is the value calculated by Qiao, set; S-1 selects other pulse 15 200910329 rush search, and returns to step 3 to perform this pulse search; if the normalized similarity 2 value cannot be raised after replacement, _ S -1; select other pulse search, and go back to step 3 to perform pulse search procedure; Step 4: When S=0, that is, all pulse search results cannot improve the normalized similarity β value, check the breadth search for the best solution record Cend. Judgment 1: If the recorded search result P is updated to the initial vector corpse' and set = 0 » Substitute step 2 to calculate the normalized similarity of the pulse; Judgment 2: If =0, the pulse search procedure is ended. 2. The controllable complexity random pulse search method for a speech coder as described in claim 1 of the patent application, wherein the normalized similarity value is: sound. 3. If the random pulse search method for the controllable complexity of the speech encoder is applied in the first paragraph of the patent application, and the individual pulses in the initial vector corpse are removed, the normalized similarity A(8) value is obtained as follows: Where the island, η 4»=Σ4^·Ι, “the number of pulses required for the speech encoder. i=0 ιφΡ 4. The controllable complex 16 200910329 degree random pulse search method for a speech coder as described in claim 1 wherein the control parameter G can be dynamically set during the pulse search process. 17
TW96132324A 2007-08-30 2007-08-30 Stochastic codebook search algorithm with complexity scalability for speech coders TW200910329A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW96132324A TW200910329A (en) 2007-08-30 2007-08-30 Stochastic codebook search algorithm with complexity scalability for speech coders

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW96132324A TW200910329A (en) 2007-08-30 2007-08-30 Stochastic codebook search algorithm with complexity scalability for speech coders

Publications (1)

Publication Number Publication Date
TW200910329A true TW200910329A (en) 2009-03-01

Family

ID=44724346

Family Applications (1)

Application Number Title Priority Date Filing Date
TW96132324A TW200910329A (en) 2007-08-30 2007-08-30 Stochastic codebook search algorithm with complexity scalability for speech coders

Country Status (1)

Country Link
TW (1) TW200910329A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI508059B (en) * 2013-02-08 2015-11-11 Asustek Comp Inc Method and apparatus for enhancing reverberated speech

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI508059B (en) * 2013-02-08 2015-11-11 Asustek Comp Inc Method and apparatus for enhancing reverberated speech

Similar Documents

Publication Publication Date Title
KR101406113B1 (en) Method and device for coding transition frames in speech signals
KR100795727B1 (en) A method and apparatus that searches a fixed codebook in speech coder based on CELP
Wan et al. Google's Next-Generation Real-Time Unit-Selection Synthesizer Using Sequence-to-Sequence LSTM-Based Autoencoders.
CN1158648C (en) Speech variable bit-rate celp coding method and equipment
JP6470857B2 (en) Unvoiced / voiced judgment for speech processing
JP2010539528A (en) Method and apparatus for fast search of algebraic codebook in speech and audio coding
Sainath et al. An Efficient Streaming Non-Recurrent On-Device End-to-End Model with Improvements to Rare-Word Modeling.
CN104021796B (en) Speech enhan-cement treating method and apparatus
CN103383846B (en) Improve the voice coding method of speech packet loss repairing quality
US9524720B2 (en) Systems and methods of blind bandwidth extension
WO2009059513A1 (en) A coding method, an encoder and a computer readable medium
RU2646357C2 (en) Principle for coding audio signal and decoding audio signal using information for generating speech spectrum
KR100556831B1 (en) Fixed Codebook Searching Method by Global Pulse Replacement
CN104517612B (en) Variable bitrate coding device and decoder and its coding and decoding methods based on AMR-NB voice signals
CN104584123B (en) Coding/decoding method and decoding apparatus
TW200910329A (en) Stochastic codebook search algorithm with complexity scalability for speech coders
Jayant et al. Speech coding with time-varying bit allocations to excitation and LPC parameters
JP5388849B2 (en) Speech coding apparatus and speech coding method
Ding et al. A Hybrid Structure Speech coding scheme based on MELPe and LPCNet
Falahati et al. Dynamic tree pruning method for fast ACELP search
JP6001451B2 (en) Encoding apparatus and encoding method
Albin et al. Objective study of the performance degradation in emotion recognition through the AMR-WB+ codec.
Wang Low bit-rate vector excitation coding of phonetically classified speech
Chomphan Analysis of Fundamental Frequency Contour of Coded Speech Based on Multi-Pulse Based Code Excited Linear Prediction Algorithm
JPH11184500A (en) Voice encoding system and voice decoding system