TW453099B

TW453099B - Multi-functional digital telephone

Info

Publication number: TW453099B
Application number: TW88123231A
Authority: TW
Inventors: Wen-Yuan Chen; Chi-Ren Jung
Original assignee: Chen Wen Yuan; Jung Chi Ren
Priority date: 1999-12-29
Filing date: 1999-12-29
Publication date: 2001-09-01

Abstract

A digital telephone is equipped with the following functions: message recording, on-line recording, voice dialing, tone adjustment, and voice pace adjustment. Moreover, each function shares its own programming code and parameter with other functions. For example, the parameter required for voice recognition can be obtained by transforming the parameter for voice compression. The functions of voice pace adjustment and tone adjustment share the same program. A method for processing digital voice signal is provided, which is applied in sampling and processing digitized voice data to enable the digital telephone equipped with the functions of tone adjustment and voice pace adjustment.

Description

453 09 9 A7 ----------------------- 五、發明說明（) ' t明領诚：本發明係關於-種語音資料處理裝置與方法，特別是與-種具有將語音訊號之資料量壓縮、還原，語音訊號之時間長度縮短或擴張、儲存、傳輸及辨識功能之數位活機有關。 t明背景：數位電子技術的快速進展，已使得資料的儲存與操作更具彈性。在語音資料的儲存與操作方面，由於類比式的語音資料已能被數位化，且數位化的語音資料可提供較佳的操作方式，使語音資料的存取控制與處理更為便利。在一習知數位式電話機中’語音係藉由一類比/數位轉換器’將類比訊號轉換為數位脈衝式調變碼（pulse code modulated，PCM)訊號。語音資料被數位化之後，可儲存於數位儲存裝置（如隨機存取記憶體），以進行特定的操作處理，例如語音資料的壓縮、語音辨識等。隨著數位信號處理器價格的下降，數位式電話機曰漸普遍’各式各樣的功能逐一加入數位式電話機中，在電話機應用上：語音辨識功能可取代手按鍵而用聲音撥 2 本紙張尺度適用中國國家標準（CNS)A4規格（210 X 297公釐） f請先閱讀背面之.;1意事項再填寫本頁} 裝--------訂---------線. 經濟部智慧財產局員工消費合作社印刻双 A7 4 53 09 9 ___B7__ 五、發明說明（）號；語音資料壓縮功能可用來記錄通話的内容，以節省記憶體儲存量；放音速度調整功能可加快放音速度，以節省撥放時間，或以減慢撥放速度的方式，以便更清楚聽懂經由答錄機錄製的聲音内容；音調調整功能則可用來改變說話者的聲調，以避免電話騷擾。在數位聲音處理過程中，要改變音調高低最簡單的作法，是將放音的速度加怏（音調升高）或變慢（音調降低），如同我們將唱盤的轉速加快或減慢的原理一樣。但這種做法會造成原來聲音時間長度的改變，所以如何在維持相同聲音時間長度的條件下，達到音調升降的效果，即成為首先要解決的問題。發明目的及概述= 目前，執行各項語音功能之技術各行其道，彼此為獨立的系統，使得數位信號處理器進行過多重複的運算，此將增加資料的儲存量與降低處理器之執行效率。因此，若能將各功能盡量使用共同的程式碼及參數，則不僅節省數位信號處理器運算的時間，也可節省儲存參數所需的記憶體。本發明所揭露之數位式電話機，可整合語音壓縮、語音辨認、放音速度調整及音調調整的功能，使各功能 3 本紙張尺度適用中國國家標準（CNS)A4規格（210 X 297公釐） )1 I ί .--it--l J — I I --------\ ^ I · t (請先閱讀背面之注意事項再填寫本頁) 經濟部智慧財產局員Η消費合作社印奴經濟部智慧財產局員工消費合作社印實 4 53 09 9 A7 ___B7 五、發明說明（）間的程式碼及參數可互相使用，如經過語音壓縮所得到的參數，經轉換後可供語音辨識系統使用；而放音速度調整與音調調整功能可共用相同的程式（僅在於輪出前之處理略有不同），從而達到節省數位信號處理器的運算量與記憶體儲存量的目的。因此，本發明之目的之一，係提供一種整合各種語音功能之數位式電話機。本發明之另一目的，係提供一種語音信號的處理方法，使執行不同功能之程式碼，能夠儘量使用共同的參數，以節省數位信號處理器運算的時間，並節省儲存參數所需的記憶體。本發明之另一目的，係提供一種語音信號的處理方法，使放音速度調整功能及音調調整功能，可以藉由「不定音框分割法」，將數位語音資料適當地取樣與銜接，在維持相同聲音時間長度的條件下，達到音調升降的效果；同時，在能夠清楚聽僅錄音内容的條件下，達到增快或減缓放音速度的目的。圃式簡單說明：第1圊為本發明的系統方塊圖； 4 本紙張尺度適用+國國家標準（CNS)A4規格（210^ 297公1 ) ^裝--------訂--------線、/ , - · - (請先閱讀背面之注意事項再填寫本頁) 4 53 09 9 A7 B7 經濟部智慧財產局員工消費合作社印焚五、發明說明（第2圖為習知之CELP語音編碼的系統方塊圖；第3圖為本發明利用不定音框分割法，將音調調降（m< 1) 之運作原理的圖示；第4圖為本發明利用不定音框分割法，將放音速度調快 (k>l)之運作原理的圖示；第5圖為本發明之語音辨認系統方塊圖。圈號對照說明= 1 0 0 語音類比訊號 102 1 04語音壓縮 106 108 音調調整裝置 110 I 12話機輸出 114 II 6語音撥號 118 120放音速度調整裝置 122 1 2 4喇叭 2 0 2 2 0 4合成語音信號Μ 205 2 0 6加法器 208 2 1 0聽覺加權濾波器 212 2 1 4線性頻譜數對 216 21 8隨機性碼本的索引值.220 2 2 2隨機性碼本 224 2 2 6 隨機性碼本的增益值 228 230第一乘算器 232 本紙張尺度適用中國國家標準（CNS)A4規格（210 X 297公釐）類比/數位轉換器數位儲存裝置數位/類比轉換器語音辨識系統語音解壓縮系統數位/類比轉換器原始數位化語音信號誤差信號Ε 線性預估濾波器加權語音誤差信號Ρ 誤差最小化電路適應性碼本的索引值適應性碼本適應性碼本的增益值第二乘算器 ------------^ --------^ ! I Ϊ ----' r - - - {請先閱讀背面之注意事項再填寫本頁) 453 09 9 A7 B7 經濟部智慧財產局員工消費合作社印製五、發明說明（ 2 3 4隨機性激發信號 2 3 6 適應η .红Λ 地' 庄；數發信號 238加异器 240次音枢延遲 2 4 2 最佳化次音框激發信號 30放音長度 32第個樣本點453 09 9 A7 ----------------------- V. Description of the invention () 'Ming Lingcheng: This invention is about a kind of voice data processing device and The method is particularly related to a digital live machine that has the functions of compressing and restoring the data volume of a voice signal, shortening or expanding the length of the voice signal, storing, transmitting, and identifying. Background: The rapid progress of digital electronics has made data storage and operations more flexible. As for the storage and operation of voice data, the analog voice data can be digitized, and the digitized voice data can provide better operation methods, making the access control and processing of voice data more convenient. In a conventional digital telephone, the 'voice is converted into a digital pulse code modulated (PCM) signal by an analog / digital converter'. After the voice data is digitized, it can be stored in a digital storage device (such as random access memory) for specific operation processing, such as voice data compression and voice recognition. With the decline in the price of digital signal processors, digital telephones are becoming more common. Various functions have been added to digital telephones one by one. In the application of telephones: the voice recognition function can replace the hand keys and dial 2 papers with sound. Applicable to China National Standard (CNS) A4 specification (210 X 297 mm) f Please read the back of the first;; 1. Matters before filling out this page} 装 -------- Order -------- -Line. The Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs engraved double A7 4 53 09 9 ___B7__ V. Invention Description () number; voice data compression function can be used to record the content of the call to save memory storage; playback speed adjustment function Speed up the playback speed to save playback time or slow down the playback speed to better understand the sound recorded by the answering machine; the pitch adjustment function can be used to change the speaker's pitch to avoid Phone harassment. In digital sound processing, the easiest way to change the pitch is to increase the speed of the sound (increasing the pitch) or slow it down (decreasing the pitch), just like the principle we use to speed up or slow down the turn of the turntable. . However, this method will cause the original sound time length to change, so how to achieve the effect of pitch rise and fall while maintaining the same sound time length becomes the first problem to be solved. Purpose and summary of the invention = At present, the technologies for performing various voice functions are different, and they are independent systems, which makes the digital signal processor perform too many repeated operations, which will increase the storage of data and reduce the execution efficiency of the processor. Therefore, if you can use the same code and parameters for each function as much as possible, you can save not only the time of the digital signal processor operation, but also the memory required to store the parameters. The digital telephone set disclosed in the present invention can integrate the functions of voice compression, voice recognition, playback speed adjustment, and tone adjustment, so that each function can be used. 3 The paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 mm). ) 1 I ί .-- it--l J — II -------- \ ^ I · t (Please read the notes on the back before filling this page) Member of the Intellectual Property Bureau of the Ministry of Economic Affairs Η Consumer Cooperatives Printed by the Consumer Cooperative of Intellectual Property Bureau of the Ministry of Economic Affairs 4 53 09 9 A7 ___B7 V. The code and parameters of the invention description () can be used with each other. For example, the parameters obtained by voice compression can be converted and used by the voice recognition system. ; The playback speed adjustment and the tone adjustment function can share the same program (only the processing before the rotation is slightly different), so as to achieve the purpose of saving the amount of calculation and memory of the digital signal processor. Accordingly, it is an object of the present invention to provide a digital telephone set which integrates various voice functions. Another object of the present invention is to provide a method for processing a voice signal, so that codes that perform different functions can use common parameters as much as possible, so as to save the time of a digital signal processor operation and save the memory required for storing parameters. . Another object of the present invention is to provide a method for processing a voice signal, so that the playback speed adjustment function and the tone adjustment function can appropriately sample and connect digital voice data by the "infinite frame division method", and maintain Under the condition of the same sound time length, the effect of pitch up and down is achieved; at the same time, under the condition that only the recorded content can be clearly heard, the purpose of increasing or slowing down the playback speed is achieved. Brief description of the garden style: Section 1 is the system block diagram of the present invention; 4 The paper size is applicable to the + National Standard (CNS) A4 specification (210 ^ 297 male 1) ^ installed -------- order- ------ line, /,-·-(Please read the notes on the back before filling out this page) 4 53 09 9 A7 B7 Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs This is a block diagram of the conventional CELP speech coding system. Figure 3 is an illustration of the operating principle of pitch reduction (m < 1) using the indefinite frame division method of the present invention. Figure 4 is the use of indefinite frame of the present invention. The segmentation method is an illustration of the operating principle of speeding up the playback speed (k >l); Figure 5 is a block diagram of the speech recognition system of the present invention. Comparison of circle numbers = 1 0 0 speech analog signal 102 1 04 speech compression 106 108 Tone adjustment device 110 I 12 Phone output 114 II 6 Voice dialing 118 120 Playback speed adjustment device 122 1 2 4 Speaker 2 0 2 2 0 4 Synthetic speech signal M 205 2 0 6 Adder 208 2 1 0 Auditory weighted filtering 212 2 1 4 Linear spectrum number pair 216 21 8 Index value of randomness codebook. 220 2 2 2 Randomness codebook 224 2 2 6 Gain value of random codebook 228 230 First multiplier 232 This paper size applies to China National Standard (CNS) A4 (210 X 297 mm) Analog / digital converter digital storage device Digital / analog converter speech recognition system Speech decompression system Digital / analog converter Original digitized speech signal error signal E Linear prediction filter Weighted speech error signal P Error minimization circuit Index value of adaptive codebook Adaptive codebook Adaptive codebook Gain value Double multiplier ------------ ^ -------- ^! I Ϊ ---- 'r---{Please read the notes on the back before filling this page ) 453 09 9 A7 B7 Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs. 5. Description of the invention (2 3 4 Random excitation signal 2 3 6 Adapt to η. Red Λ Land 'Zhuang; several signals 238 adder 240 sounds Pivot delay 2 4 2 Optimize sub frame excitation signal 30 Play length 32 First sample point

34最佳銜接.點 36捨去的樣本數X 3 8第（2 Ν - X) / m個樣本點 4 0第k X N個樣本點 4 2最佳銜接點 4 4第k ( 2 N - X )個樣本點 5 0 0 LSPs參數 502參考定，备1 可予惫的LSP參數 5 0 4 mel-scale pseudo-cepstrum 參數 5 0 6 mel-scale pseudo-cepstrum 參數 508樣型比對 510判斷法則 5 1 2 辨認結果發明謀細說明：本發明所揭露之數位式電話機，具有下列功一、電話答錄機功能；二、線上錄音功能；三、可放音速度功能：四、音調調整功能，以及五、扭音撥號功3b。各項功能之運作方式如下：一、電話答錄機功能當電話機響鈐數超過設定數目時，即啟動語音縮系統，撥放事先錄製於數位储存裝置的招呼語，再啟動L s壓縮系統，將來話者的聲音’即時地壓本纸張尺度適用中0國家標準（CNS)A4規格能：調整控制解壓而後縮儲 ------------^--------訂---------t - -> · {請先間讀背面之注意事項再填寫本頁) 經濟部智慧財產局員工消費合作社印製 4 53 09 9 A" _B7 五、發明說明（）存於數位儲存裝置。二' 線上錄音功能在通話當中，按下線上錄音功能鍵，即可啟動語音壓縮系統，將通話中的聲音進行壓縮，並將語音資料儲存於數位儲存裝置。三、放音速度調整功能當使用者撥放儲存於數位儲存裝置的語音資料時，可選擇加快撥放速度以節省時間，成減慢撥放速度以便更清楚聽懂錄音内容，且保有留話者的聲音特性 (即音調高低）。四、音調調整功能使用音調調整功能，可以改變答話者的音調高低，讓對方難以判斷答話者的身分，且維持與音調未改變前相同的說話速度。五、語音控制撥號功能預先將常用的受話者名稱，用聲音輸入，而後經壓縮處理後儲存於數位儲存裝置，並將其所對應的電話號碼以按鍵輸入，一同儲存於_數位儲存裝置。當需要與某一特定的受話者通話時，使用者只要說出受話者名稱，語音辨認系統即從數位儲存裝置中找出最佳的辨認結果，並將辨認結果的受話者名稱，以聲音撥放（也可用其他方式顯示，例如液晶顯示器），若使用者確認無誤， 7 本紙張尺度適用中國國家標準（CNS)A4規格（210 X 297公S ) I----------------訂 - - -------r 〆 - " ' (請先闉讀背面之注意事項再填寫本頁) 453 09 934 best connections. Number of samples rounded off at point X 3 8th (2 Ν-X) / m sample points 4 0th kXN sample points 4 2 best connection points 4 4kth (2 N-X ) Sample points 5 0 0 LSPs parameter 502 reference determination, prepare 1 exhaustible LSP parameter 5 0 4 mel-scale pseudo-cepstrum parameter 5 0 6 mel-scale pseudo-cepstrum parameter 508 pattern comparison 510 judgment rule 5 1 2 Detailed description of the identification result invention: The digital telephone set disclosed in the present invention has the following functions: 1. Answering machine function; 2. Online recording function; 3. Playback speed function: 4. Tone adjustment function, and Fifth, twist tone dialing function 3b. The functions of each function are as follows: 1. Telephone answering machine function When the number of phone rings exceeds the set number, the voice reduction system is activated, the greetings recorded in the digital storage device are played in advance, and the L s compression system is activated. In the future, the voice of the speaker will be pressed in real time on this paper. The national standard (CNS) A4 specification can be used to adjust and decompress the pressure and then shrink it .------------ ^ ------ --Order --------- t--> · {Please read the precautions on the back before filling out this page) Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 4 53 09 9 A " _B7 5 2. Description of the invention () Stored in a digital storage device. II 'Online recording function During a call, press the online recording function key to activate the voice compression system, compress the voice during the call, and store the voice data in the digital storage device. 3. Playback speed adjustment function When the user plays the voice data stored in the digital storage device, he can choose to speed up the playback speed to save time, and to slow down the playback speed in order to understand the recording content more clearly and keep a message Speaker's sound characteristics (that is, pitch). Fourth, the tone adjustment function The tone adjustment function can be used to change the pitch of the respondent, making it difficult for the other party to judge the identity of the respondent, and maintaining the same speaking speed as before the pitch has not changed. V. Voice-controlled dialing function Input the commonly used callee name in advance by voice, and then store it in a digital storage device after compression processing, and input the corresponding phone number with a key, and store it in the _digital storage device together. When it is necessary to speak with a specific callee, the user only needs to say the name of the callee, and the speech recognition system finds the best recognition result from the digital storage device, and dials the callee name of the recognition result by voice. Display (also can be displayed in other ways, such as LCD display), if the user confirms that it is correct, 7 paper sizes are applicable to China National Standard (CNS) A4 (210 X 297 male S) I ---------- ------ Order--------- r 〆- " '(Please read the precautions on the back before filling this page) 453 09 9

號程 i、發明說明（則將所對應的電話號碼送至撥號系統，以進行撥序。參閲第1圖’其為本發明的系統方塊圖。當語音的類比訊號100輸入時，可先行將此類比訊號經由類比/ 數位轉換器i 02轉換為數位訊號，而後將此語音數位訊號進行語音壓縮104並儲存於數位健存裝置106,或直接藉由音調調整裝置丨08執行音調調整功能，並將處理過後之數位信號經數位/類比轉換胃UG轉換為類比信號再經由話機輸出11 2。經語音壓縮處理所得到的參數，即可提供語音辨識系統114處理，以執行語音撥號功能116，再經由話機輸出112。儲存於數位儲存裝置 1 0 6的語音壓縮資料，經由語音解壓縮系統丨丨8處理後’可進一步藉由放音速度調整裝置12〇執行放音速度調整功能’並將處理過後之數位信號經數位/類比轉換器1 2 2轉換為類比信號’再經由制。八1 2《輸出。上述為本發明所揭露之數位式電話機各項功能運作方式之說明》以下將就完成各項功能之個別系統做更洋盡的說明。 (1) 語音壓縮「線性預估編瑪」（linear predictive coding, LPC) 係一習知語音的編碼方法，其適用於低位元傳輸速率 (b i t ra t e )的語音編碼。本發明採用「激發碼線性預估」（Code Excited Linear Prediction, CELP 4,8kbps) 8 本纸張尺度適用中画國家揉準（CNS)A4規格（210 X 297公釐） — — — — — — —^. — ― ！1111τ.—---- ---IV (請先閱讀背面之注意事項再填寫本頁) 經濟部智慧財產局員工消費合作社印製經濟部智慧財產局員工消費合作社印製 453 09 9 A7 _____B7__ 五、發明說明（）的方法來壓縮語音資料，參考文獻包括：A. M. Kondon, '•DIGITAL SPEECH Coding for Low Bit Rate Communications Systems, H Wiley ，以及 Joseph P. Campbell, Vanoy C. Welch, and Thomas E. Tremain, "CELP Documentation, U· S, Government Department o f D e f e n s e，D e c. 1 9 8 9 等。參閱第2圖，其為利用CELP語音編碼之系統方塊圖。一般而言’CELP系統包含一隨機性碼本（st〇chastic code book)與一適應性碼本（adaptive 0ode book)。原始數位化語音信號S 202以音框（frame)為單位輸入加法器206，每一音框包含有N個語音信號的樣本 (sample)，而每一個音框則進一步分成若干個次音框 (subframe)。同時’線性預估濾波器208對原始數位化語音信號S 2 0 2的每一個音框進行線性預估分析，以產生一組線性預估係數’此線性預估係數將進—步轉換為線性頻譜數對 214(line spectrum pair，lsp)係數。由初始激發信號經過線性預估濾波器208後的合成語音信號Μ 204在加法器206和原始數位化語音s 202做減法運算’得到誤差信號E 205，誤差信號E 2〇5 再經聽覺加權濾波器2 1 0處理後，得到加權語音誤差产號P 212。加權語音誤差信號P 212輪入誤差最小化 (Error Minimization)電路216，此電路包含人成健:皮器、距離計算器以及碼本搜尋器，其係用以選擇最佳隨機性碼本的索引值Index_s 218、隨機性碑本的增益值 Gs 226、適應性碼本的索引值Index a ^ ^ - 。υ，以及適應 9 本紙張尺度適用中國固家標準（CNS)A.l規格（210 X 297公釐） ΐ請先閱讀背面之泫意事項再填寫本頁) ·ι裝--------訂------線. 4 53 09 9 A7 B7 經濟部智慧財產局員工消費合作社印努五、發明說明（性碼本的增益值Ga 228，使得加權語音誤差信號p 212 為最小。誤差最小化電路2 1 6將隨機性碼本的索引值 Index_s 218與適應性碼本的索引值index_a 22〇分別輸入隨機性碼本222與適應性碼本224，並將隨機性碼本的增益值Gs 226與適應性碼本的增益值Ga 228分別輸入第一乘算器230與第二乘算器232。隨機性碼本222 根據隨機性碼本的索引值Index_s2i8，以隨機性激發信號es 234輸出激發源，而適應性碼本224則根據適應性碼本的索引值Index —a 22 0以適應性激發信號ea 236輸出激發源，其中隨機性碼本222木會更新，但適應性碼本224會藉由加算器238的輸出與次音框延遲 (Subframe Delay) 240的處理’將最佳化次音框激發信號eo 242週期性地更新適應性碼本224。因此，每個音框的語音資料經語音壓縮後，可得到以下參數‘隨機性碼本的索引值Index —s、隨機性碼本的增益值Gs、適應性碼本的索引值Index_a、適應性碼本的增益值Ga，以及線性頻譜數對LSps。以上各參數皆儲存於數位儲存裝置中。 (2 )語音解壓縮邊音解壓縮系統從數位儲存裝置中讀取隨機性碼本的索Μ值（IndeX_s)、隨機性碼本的增益值（Gs)、適應性碼本的索引值（丨ndex —a)、適應性碼本的增益值（^) 以及線性頻譜數對（LSPs)後進行語音合成Upeech synthes 1Zing)處理，經合成之後的語音信號將輪出至 (請先閲讀背面之注意事項再填寫本頁) 襄--------訂---------線—Number sequence i, description of the invention (then the corresponding telephone number is sent to the dialing system for dialing sequence. Refer to Figure 1 for a block diagram of the system of the present invention. When the analog signal 100 of the voice is input, it can be performed first. The analog signal is converted into a digital signal by the analog / digital converter i 02, and then the voice digital signal is subjected to voice compression 104 and stored in the digital storage device 106, or the tone adjustment function is performed directly by the tone adjustment device 丨 08. The processed digital signal is converted into an analog signal by digital / analog conversion, and then output by the phone 112. The parameters obtained by the voice compression processing can be provided by the voice recognition system 114 to perform the voice dialing function 116, Then output 112 via the phone. The voice compression data stored in the digital storage device 106 is processed by the voice decompression system 丨丨 8 and can be further performed by the playback speed adjustment device 120 to perform the playback speed adjustment function. The processed digital signal is converted into an analog signal by a digital / analog converter 1 2 2 and then processed. 8 12 "output. The above is Explanation of the Functions of Digital Telephones Disclosed by the Invention "The following will provide a more complete description of the individual systems that complete each function. (1) Speech compression" linear predictive coding "(LPC) ) Is a known speech coding method, which is suitable for low bit bit rate speech coding. The present invention uses "Code Excited Linear Prediction" (CELP 4,8kbps) 8 papers The scale is applicable to the Chinese painting national standard (CNS) A4 (210 X 297 mm) — — — — — — — ^. — ——! 1111τ .—---- --- IV (Please read the note on the back first Please fill in this page again) Printed by the Employees 'Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs Printed by the Employees' Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 453 09 9 A7 _____B7__ V. Method of Invention () to compress the voice data. , '• DIGITAL SPEECH Coding for Low Bit Rate Communications Systems, H Wiley, and Joseph P. Campbell, Vanoy C. Welch, and Thomas E. Tremain, " CELP Documentation, U · S, Government Department o f D e f e n s e, De c. 1 9 8 9 etc. See Figure 2 for a block diagram of a system using CELP speech coding. Generally speaking, the 'CELP system includes a stochastic code book and an adaptive code book. The original digitized speech signal S 202 is input to the adder 206 in units of sound frames. Each sound frame contains N samples of speech signals, and each sound frame is further divided into several sub-frames ( subframe). At the same time, the 'linear prediction filter 208 performs linear prediction analysis on each frame of the original digitized speech signal S 2 0 2 to generate a set of linear prediction coefficients.' This linear prediction coefficient will be further converted to linear The number of spectrum pairs is a 214 (line spectrum pair, lsp) coefficient. The synthesized speech signal M 204 after the initial excitation signal passes the linear prediction filter 208 is subtracted from the adder 206 and the original digitized speech s 202 to obtain the error signal E 205, and the error signal E 2 05 is then subjected to auditory weighting filtering. After the processor 2 10 processes, a weighted speech error production number P 212 is obtained. Weighted speech error signal P 212 Error Minimization circuit 216. This circuit contains human health: leather goods, distance calculator, and codebook searcher. It is an index used to select the best randomness codebook. The value Index_s 218, the gain value Gs 226 of the random tablet, and the index value Index a ^ ^-of the adaptive codebook. υ, and adapt to 9 paper sizes applicable to China Solid Standard (CNS) Al specifications (210 X 297 mm) ΐPlease read the intentions on the back before filling this page) · Installation -------- Order ------ line. 4 53 09 9 A7 B7 Employees' Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs, Consumer Cooperatives, India Nu V. Description of the invention (Gain codebook gain value Ga 228, which makes the weighted speech error signal p 212 the smallest. Error The minimization circuit 2 1 inputs the random codebook index Index_s 218 and the adaptive codebook index_a 22 into the random codebook 222 and the adaptive codebook 224, and sets the gain of the random codebook. The gain values of Gs 226 and the adaptive codebook Ga 228 are respectively input to the first multiplier 230 and the second multiplier 232. The randomness codebook 222 excites the signal es 234 with randomness according to the index value Index_s2i8 of the randomness codebook. The excitation source is output, and the adaptive codebook 224 outputs the excitation source according to the index value of the adaptive codebook Index —a 22 0 to the adaptive excitation signal ea 236. The random codebook 222 will be updated, but the adaptive codebook 224 uses the output of adder 238 and the subframe delay (Subframe De The processing of lay) 240 'will optimize the sub-frame excitation signal eo 242 to periodically update the adaptive codebook 224. Therefore, after the speech data of each frame is compressed, the following parameters' random codebook' can be obtained The index value Index —s, the gain value Gs of the random codebook, the index value Index_a of the adaptive codebook, the gain value Ga of the adaptive codebook, and the linear spectrum number versus LSps. All the above parameters are stored in the digital storage device. (2) The speech decompression and sidetone decompression system reads the random codebook's SoM value (IndeX_s), the random codebook's gain value (Gs), and the adaptive codebook's index value from the digital storage device. (丨 ndex —a), the gain value of the adaptive codebook (^) and the linear spectrum number pairs (LSPs) and then speech synthesis Speech synthes 1Zing) processing, the synthesized speech signal will be rotated to (please read the back first) (Please note this page before filling in this page) Xiang -------- Order --------- line—

經濟郃智慧財產局員工消費合作社印製 4 53 09 9 A7 ___B7_ 五、發明說明（）放音速度調整系統，以便進行聲音之撥放。 (3 )音調調整系統陳思平於其碩士論文：「MPEG解碼、音高調整及次頻編碼之音訊處理演算法研究」中提及的不定音框分割法，可應用於實現將數位語音資料適當地取樣與銜接，並在維持相同聲音時間長度的條件下，達到音調升降的效果。參閱第3圊，其為本發明利用不定4框分割法，將音調調降之運作原理的圖示。利用不定音框分割法實現音調升降的步驟如下：' 一、設定調整音調的比例值 m (ra>l代表升音：m<l 代表降音）。二、假設類比轉數位的取樣頻率為f，則設定數位轉類比的取樣頻率為（m xf)» 三、由原始聲音中取出一個音框的樣本數N，在類比轉數位的取樣頻率為f時的放音時間為t。四、進行音框樣本的音調升降處理。五、升降頻後的音框放音長度變為t /m 30，而後選取下一個音框，其係自原始聲音中的第N / m個樣本點 3 2開始的N個樣本。六、找出最佳銜接點 3 4 (即波形最匹配處）將前後兩音框銜接在一起。七、將新音框的樣本進行同步驟四中的音調升降處理。 11 本紙張尺度適用令固國家標準（CNS)A4規格（210 X 297公釐） ------------- ^ --------t---------V . - 一 -\ (請先閱讀背面之沒意事項再填寫本頁) 經-部智慧財產局員工消费合作社印没 4 53 09 g A7 ___B7_ 五、發明說明（）八、若銜接點位於第（（N / m) + X )資料點，則銜接後兩音框的總長度為（2 N - X )，其中X 3 6為因衡接而捨去的樣本數，則下一個音框為由原始聲音中的第 (2 N - X ) / m個樣本點3 8開始的N個樣本。九、依此類推，重複步驟三到八的步驟，直到所有音框皆處理完畢為止。 (4)放音速度調整在保有說話者的聲音特性條件下，想要加速聽取錄音内容以節省時間，或減慢放音的速度丄便能更清楚聽僅錄音内容，這些功能同樣可利用不定音框分割法實現。第4圖為本發明利用不定音框分割法，將放音速度調快之運作原理的圖示，利用不定音框分割法實現放音速度調整的步驟如下：一、設定數位轉類比的取樣頻率等於原來錄音相同的取樣頻率。二、設定調整放音速度的比例值k (k>l代表加速；k < 1代表減速）。三、由原始聲音中取出一個音框的樣本數N。四、進行音框樣本的放音速度調整處理。五、下一個音框為由原始聲音中的第k X N個樣本點4 0 開始的N個樣本。六、找出最佳銜接點 4 2 (即波形最匹配處）*將前後兩音框銜接在一起。七、將新音框的樣本進行同步驟四中的放音速度調整 12 本纸張尺度適闬中S 0家洁準（CNS)Al規格（210 X 297公发) ^--------訂--------» - - - (請先閱讀背面之注意事項再填寫本頁) 453 09 9 A7 -------------- 五、發明說明（）處理。八、假設銜接後的樣本數為（2Ν_χ)，其中χ為因銜接而捨去的樣本數’則下一個音框為由原始聲音中的第 k(2N-X)個樣本點44開始的Ν個樣本。九、依此類推，重複步驟三到八的步驟，直到所有音框皆處理完畢為止。 (5)语音辨遇系統第5圈為本發明之語音辨認系統方塊圊β進行語音辨認所需之參數，可由語音壓縮參數轉換而得。首先，語音辨認系統之輸入資料為LSPs參數500，為得到更好的辨過效果’ LSPs參數需轉換成mei_scaie pseudo-cepstrum參數，其轉換方法係參考Seung H〇 Ch〇i，H〇nh Kook Kim, Hwang Soo Lee and R. M. Gray, "Speech recognition method using quantised LSP parameters in CELP-type coders, " Electronics Letters 22ndPrinted by the Economic and Intellectual Property Bureau's Consumer Cooperatives 4 53 09 9 A7 ___B7_ V. Description of the Invention () Playback speed adjustment system for sound playback. (3) Pitch adjustment system Chen Siping in his master's thesis: "Study on Audio Processing Algorithms for MPEG Decoding, Pitch Adjustment, and Sub-Frequency Encoding" can be applied to realize the appropriate conversion of digital speech data Ground sampling and connection, and achieve the effect of tone up and down while maintaining the same sound duration. Refer to Section 3, which is an illustration of the operation principle of pitch reduction using the indefinite 4-frame division method of the present invention. The steps to achieve pitch up and down using the indeterminate frame division method are as follows: '1. Set the ratio of the adjustment pitch m (ra > l represents ascending: m < l represents descending). 2. Suppose the sampling frequency of analog to digital is f, then set the sampling frequency of digital to analog to (m xf) »3. Take the number of samples N of a sound box from the original sound, and the sampling frequency of digital to analog The playback time is t. 4. Perform the tone raising and lowering processing of the sound frame sample. 5. The length of the sound box after the frequency rise and fall becomes t / m 30, and then the next sound box is selected, which is the N samples starting from the N / mth sample point 3 2 in the original sound. 6. Find the best connection point 3 4 (that is, where the waveforms most closely match) connect the front and back frames together. 7. Perform the tone raising and lowering processing on the sample of the new sound frame as in step 4. 11 This paper size is applicable to Ling Gu National Standard (CNS) A4 (210 X 297 mm) ------------- ^ -------- t ------ --- V.-I- \ (Please read the unintentional matter on the back before filling out this page) The Ministry of Intellectual Property Bureau's Employee Cooperative Cooperative printed 4 53 09 g A7 ___B7_ 5. Description of the invention () 8. If connected The point is located at the ((N / m) + X) data point, then the total length of the two sound frames after the connection is (2 N-X), where X 3 6 is the number of samples dropped due to the balance, then the next The sound frame is N samples starting from (2 N-X) / m sample points 38 in the original sound. Nine, and so on, repeat steps 3 to 8 until all the frames have been processed. (4) Playback speed adjustment Under the condition that the speaker's voice characteristics are maintained, if you want to speed up listening to the recording to save time, or slow down the playback speed, you can hear the recording only more clearly. These functions can also be used indefinite Sound frame segmentation method is implemented. FIG. 4 is an illustration of the operating principle of using the indefinite frame division method to speed up the playback speed. The steps for adjusting the playback speed by using the indefinite frame division method are as follows: 1. Set the sampling frequency of digital to analog Equal to the same sampling frequency as the original recording. 2. Set the ratio k for adjusting the playback speed (k > l represents acceleration; k < 1 represents deceleration). 3. Take the number N of samples of a sound box from the original sound. 4. Adjust the playback speed of the sound frame samples. 5. The next frame is N samples starting from the k X Nth sample point 4 0 in the original sound. 6. Find the best connection point 4 2 (that is, where the waveforms most closely match) * Connect the front and back frames together. 7. Adjust the playback speed of the sample of the new sound frame in the same way as in step 4. 12 The paper size is moderate S 0 Jia Jie Zhun (CNS) Al specifications (210 X 297 public) ^ ------ --Order -------- »---(Please read the notes on the back before filling this page) 453 09 9 A7 -------------- V. Description of the invention () deal with. 8. Assume that the number of samples after the connection is (2N_χ), where χ is the number of samples discarded due to the connection. Then the next frame is the Ν starting from the k (2N-X) th sample point 44 in the original sound. Samples. Nine, and so on, repeat steps 3 to 8 until all the frames have been processed. (5) Speech recognition system The fifth circle is the parameters required by the speech recognition system block 圊 β of the present invention for speech recognition, which can be obtained by converting the speech compression parameters. First, the input data of the speech recognition system is LSPs parameter 500. In order to obtain better recognition results, the LSPs parameters need to be converted into mei_scaie pseudo-cepstrum parameters. For the conversion method, refer to Seung H. Choi, Hon. Kook Kim , Hwang Soo Lee and RM Gray, " Speech recognition method using quantised LSP parameters in CELP-type coders, " Electronics Letters 22nd

Januaryl 998’ Vol.34’ No.2， pp.1 56 —1 57 以及11,!(·Januaryl 998 ’Vol.34’ No.2, pp.1 56 —1 57 and 11,! (·

Kim, K. C. Kim and H. S. Lee, "Enhanced distance measure for LSP-based speech recognition,"、 Electronics Letters 5th August 1993, Vol.29, No.16, pp. 1463-1465。訓練時，參考字彙的LSP參數502儲存於數位儲存裝置中；辨認時，要辨認字彙的LSPs參數即轉換成 tn e 1 - s c a 1 e p s e u d 〇 - c e p s t r u m 參數 5 0 4，同時參考字彙的L S P參數5 0 2也轉換成m e 1 - s c a 1 e p s e u d ο-cepstrum參數506，而後逐一做樣型比對508找出最相 13 本紙張尺度適用中國國家標準（CNS)A4規格（210 X 297公S ) (請先閱讀背面之注意事項再填寫本頁) 装 i I I I I I i ^ I I I ---II· 經濟部智慧財產局員工消費合作社印- 4 53 09 9 A7 ___B7五、發明說明（）似的參考字彙，並經判斷法則5 1 0處理，以作為辨認結果 51 2。辨認結果啟動解壓縮系統，將數位儲存裝置中所對應的語音資料解壓縮後，用聲音撥放出來，並且經由使用者確定無誤後，將儲存的對應電話號碼撥出。以上所述僅為本發明之較佳實施例而已，並非用以限定本發明之申請專利範圍；凡其它未脫離本發明所揭示之精神下所完成之等效改變或修飾，均應包含在下述之申請專利範圍内。 (請先閱讀背面之注意事項再填寫本頁) ¥.·!-----訂---------線- 經濟部智慧財產局員工消費合作社印製 14 本紙張尺度適用中國國家標準（CNS)A4規格（210 X 297公Μ )Kim, K. C. Kim and H. S. Lee, " Enhanced distance measure for LSP-based speech recognition, ", Electronics Letters 5th August 1993, Vol. 29, No. 16, pp. 1463-1465. During training, the LSP parameters of the reference vocabulary 502 are stored in the digital storage device; during recognition, the LSPs parameters of the vocabulary to be identified are converted to tn e 1-sca 1 epseud 〇- cepstrum parameters 5 0 4 and the LSP parameters of the reference vocabulary 5 0 2 is also converted into me 1-sca 1 epseud ο-cepstrum parameter 506, and then make a pattern comparison 508 one by one to find the most similar 13 paper size applies Chinese National Standard (CNS) A4 specifications (210 X 297 male S) ( Please read the notes on the back before filling out this page) Install i IIIII i ^ III --- II · Stamp of Consumer Cooperatives of Intellectual Property Bureau of the Ministry of Economic Affairs-4 53 09 9 A7 ___B7 V. Description of the invention () Similar reference vocabulary, And processed by the judgment rule 5 10 as the recognition result 51 2. The recognition result starts the decompression system. After decompressing the corresponding voice data in the digital storage device, it is dialed out by voice. After the user confirms that it is correct, the corresponding telephone number is dialed out. The above are merely preferred embodiments of the present invention, and are not intended to limit the scope of patent application for the present invention; all other equivalent changes or modifications made without departing from the spirit disclosed by the present invention shall be included in the following Within the scope of patent application. (Please read the notes on the back before filling this page) ¥. ·! ----- Order --------- Line-Printed by the Employees' Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 14 This paper size is applicable to China National Standard (CNS) A4 specification (210 X 297mm)

Claims

453 09 9 8 0 ^ 88 A23CD patent application scope 1. — a digital telephone device, with functions of answering machine, online recording 'voice dialing, adjustment, and playback speed adjustment, the digital telephone is equipped with a digital storage To store digitized voice data; a pitch adjustment device for processing the digitized voice data to raise or lower the playback tone and maintain the playback time length when unprocessed; and a playback speed adjustment device, It is used to process the digitized voice data to increase or slow down the playback speed and maintain the pitch level when not processed. 2. If the digital telephone device according to item 1 of the scope of patent application, it makes the above digital storage device random access memory (RAM). Please read the note at the back of Φ-Matters-and then fill in 1. This page is printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs. 3. If the digital telephone device of the first scope of the patent application, its + above The method for the tone adjustment device to process the digitized voice data includes at least the following steps: (a) Set the ratio m of the adjustment tone; (b) Set the sampling frequency of the digital signal to be converted to an analog signal as (m xf), where f Is the sampling frequency of analog signals converted to digital signals; (c) the number of samples N of the first sound frame is taken from the original sound, and the playback time is t 1 15 when the sampling frequency of analog signals converted to digital signals is f This paper size applies to China National Standard (CNS) A4 (210x 297 mm) line 8 888 ABCD 453 09 9 VI. Patent application scope (d) The tone frame of the sound frame sample is processed to make the sound frame after the frequency rise and fall The sound length becomes t / m; (e) setting a second sound frame, which is composed of N samples starting from the N / mth sample point in the original sound; (f) finding the best connection point, The first sound frame and the second The frames are connected together, (g) The sample of the third frame is subjected to the tone raising and lowering processing as in step 4; (h) If the connection point is located at the ((N / m) + X) data point, the next two sounds are connected The total length of the frame is (2N-X), where 'X' is the number of samples discarded by the concatenation, and the fourth frame is N starting from the (2N-X) / m sample points in the original sound And (i) repeat the steps (c) to (h) until all the digital voice data has been processed. 4. If the digital telephone device of the first patent application range, the above-mentioned tone adjustment device divides the digital voice data into a plurality of sound frames, and performs the tone raising and lowering processing of the sound frame samples, and then balances the plurality of sound frames. Sound box. 5. If the digital telephone device of item 1 of the patent application 'in which the above-mentioned playback speed adjustment device processes the digitized voice data at least includes the following steps: (a) Set the sampling frequency of digital signals to analog signals equal to The same sampling frequency as the original recording; 16 This paper size applies to the Chinese National Standard (CNS) A4 (210 x 297 mm). -------- Order * ----- I-- (Please read first (I will fill in this page on the back; I will fill in this page again) Printed by the Employees 'Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs Printed by the Employees' Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 4 53 09 9 A8 B8 C8-—--_ D8 ~ ~ ~ 1 · _ ___—— VI. Patent Application Range (b) Set the ratio k for adjusting the playback speed; (c) Take the number of samples of the first sound box from the original sound ^ ', and perform the playback speed of the sound box samples Adjustment processing; (d) setting a second frame 'which is composed of N samples starting from the ^ × Nth sample point in the original sound; (e) finding the best connection point and placing the first frame Connected to the second frame; (f) the third frame This step performs the adjustment of the playback fan in step 3; (g) If the number of samples after the connection is (2N_X), 'where X is the number of samples dropped due to the connection', the fourth frame is the original sound It consists of N samples starting from the k (2N-X) th sample point in the sequence: (h) Repeat steps (c) to (g) until all the digital voice data has been processed. 6. For example, the digital telephone device of the scope of patent application, wherein the above-mentioned playback speed adjustment device divides the digital voice data into a plurality of sound frames, and performs the playback speed processing of the sound frame samples, and then connects the plurality Sound boxes. 7. For example, the digital telephone device of the scope of patent application, wherein the above device further comprises: a voice compression device for compressing the digitized voice data; a voice decompression device 'for the digitized voice data Decompression; and 17 paper sizes are applicable to China National Standard (CNS) A4 specifications (210 * 297 mm) i! 11 ----- ^ 1--11H ^ ------ 1 --- (Please read the note on the back before filling this page) ABCS 453 09 9

Sixth, the scope of patent application — — — — — ΪΙΙΙΙΙΙΙΙ —. — Ί f Please read the notes on the back before filling this page) A speech recognition device, which is used to pair the reference vocabulary with the linear spectrum number of the word to be identified $ (Line spectrum pair, LSP) is converted into me 1 -sea 1 e pseudo-cepstrum parameters, and then the pattern comparison is performed one by one, and the identification result is obtained after being processed by the judgment rule. 8 The digital telephone device according to item 7 of the scope of patent application, wherein the above-mentioned voice compression device uses the Code Excited Linear Prediction (CELP) method to compress the digital voice data. «9. If the digital telephone device according to item 7 of the patent application scope, wherein the above-mentioned line spectrum parameters (LSPs) parameters required for the speech recognition device are provided by the above-mentioned speech compression device. 1 0. — A digital audio signal processing method that raises or lowers the playback tone. The method includes the following steps:-Line · (a) Set the proportional value m of the key; (b) Set the digital ^ to convert to analog signal The fetching frequency is (m xf), where f is the sampling frequency for converting analog signals into digital signals; printed by the consumer cooperation of the Intellectual Property Bureau of the Ministry of Economic Affairs (c) the number of samples of the first sound frame taken from the original sound N And the playback time is t when the analog signal is converted to a digital signal with a sampling frequency of f; (d) the tone rise and fall processing of the sound frame sample is performed so that the sound frame playback length after the frequency rise and fall becomes t / m; (e) Set the second sound box 'This is the N / m 18th in the original sound. This paper size is applicable to China Solid H3 standard (CNShVi gauge (210x297)) 8 0088 AKCD 453 09 9 6. Application The patent range consists of N samples starting from the sample points; (f) Finding the best balance point, linking the first frame with the second frame '(g) Samples from the third frame Perform the tone raising and lowering process as in step 4; (h) If the balance contact is located at ((N / m) + X) data points, the total length of the two sound boxes after the connection is (2N-X), where X is the number of samples discarded due to the connection, and the fourth sound box is the (2N-X) of the original sound / m sample points start with N samples: (i) Repeat steps (c) to (h) until there is no digital audio data processing. 11. A digital audio signal processing method to make the playback speed To speed up or slow down, the following steps: (a) set the sampling frequency of the analog signal to the same sampling frequency as the original recording; (b) set the ratio k for adjusting the playback speed; (c) from the original sound Take out the number N of samples of the first sound frame, and perform the playback speed adjustment processing of the sound frame samples; (d) Set the second sound frame, which is the N samples starting from the k x N sample points in the original sound Composition; (e) find the best connection point, and connect the first frame with the second frame; (f) adjust the playback speed of the sample of the third frame in step 3 Processing; (g) If the number of samples after the connection is (2 NX), its 1f1 X is appropriate for the size of 19 papers. National Standard (CNS) A4 Specification (210x 297 cm) -------- Order --------- Line β. '. * (Please read the notes on the back before filling (This page) 丨 Printed by the Intellectual Property Bureau Employee Consumer Cooperative of the Ministry of Economic Affairs 4 53 Ο9 9 Β8 ^ C8 D8 VI. The number of samples that are discarded in the scope of patent application, the fourth frame is the k (2 Ν-X) consists of N samples starting from the sample point; h repeats the weight as} After finishing the processing, the number of digits of the material sound has reached a straight step from § to --I ------- --- 」立 --- '(谙 Please read the notes on the back before filling out this page) Order: -Line · Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economy 20 This paper applies the national standard of the country (CNS) A4 size mo X 297 mm)