200941456 九、發明說明: 【發明所屬之技術領域】 本發明是有關於一種消除語音訊號中之環境雜訊之 方法,特別是有關於一種依據不同環境調整雜訊強度因 子以消除環境雜訊之技術領域。 【先前技術】 ^ 雜訊(Noise)係為一種干擾及妨礙人們正常工作、學 〇 習及休息的聲音。當訊號號在傳輸過程,多少都會受到 一些不需要的額外能量(即為雜訊)的干擾。雜訊的干擾, 通常都會造成信號的失真,其來源通常來自系統外部或 内部。 就外部雜訊來源而言,人為雜訊係為語音辨識中影 響最大之雜訊,人為產生的雜訊來源如汽、機車飛機的 點火系統、電動機、交換式設備,高壓電纜線及利用電 ^ 弧放電的螢光燈等,此種雜訊干擾範圍介於0至600MHz。 内部雜訊來源甚多,其一係所謂的熱雜訊(Thermal Noise或White Noise)。此種雜訊的來源來自電阻性元件 内部電子移動隨機所生的,其強度與電阻的環境絕對溫 度成正比。對於語音辨識系統而言,電腦主機之内部元 件運轉時所產生之聲音,例如:風扇,係為干擾語音辨 識準確性之主要影響因素。 習知利用預估包含環境雜訊之語音信號之信噪比以 調整雜訊強度因子,且在一特定音框内,係設定強度因 5 200941456 子為固定不變。此設定係假設雜訊對於語音訊號之整個 頻譜的影響係為一致。然而,現實生活中的雜訊多為色 (color)雜訊,色雜訊對語音信號整個頻譜的影響係非具有 一致性,語音訊號某些頻段受到雜訊影響比別的頻段要 大得多,故大大降低了訊號辨識之準確性。 有鑑於習知技藝之各項問題,為了能夠兼顧解決 之,本發明人基於多年研究開發與諸多實務經驗,提出 一種消除語音訊號中之環境雜訊之方法,以作為改善上 © 述缺點之實現方式與依據。 【發明内容】 有鑑於此,本發明之目的就是在提供一種消除語音 訊號中之環境雜訊之方法,以提高語音辨識系統對於不 同環境噪音之適應性。 根據本發明之目的,提出一種消除語音訊號中之環 Q 境雜訊之方法,此語音訊號係包含一環境雜訊及一純語 音訊號,此方法包含下列步驟: i) 於語音訊號中設定一音框; ii) 計算音框内之一上限頻率值及一信噪比(SNR); iii) 分別根據上限頻率值及信噪比以決定對應該 音框之一第一調整參數及一第二調整參數; iv) 利用第一調整參數及第二調整參數與一預設雜 訊強度因子進行運算,以產生一已修正之雜訊強度 6 200941456 因子;以及 V)使用一頻譜減法及已修正之雜訊強度因子進行 運算以從語音訊號中消除環境雜訊。 【實施方式】 〇 第一調整參數及一第二調整參數 之 凊參閱第1圖,其係為本發明之消除語音訊號中之 環境雜訊之方法之步驟流程圖。此語音訊號係包含一環 境雜訊及-純語音訊號 此方法係包含下列步驟,在步 驟S10’首先於語音訊射設定—音框,接著在步驟川, 偵測音框内之上限頻率值及信噪比(SNR^接續,在步驟 S12中分別根據上限頻率值及信噪比以決定對應此音框 *y -- <S〇r jnL. «_»_ 因為一般自然界的雜訊(汽車、街道及人聲等 ❹ 高頻遞減’而其能量往往分佈於整個 頻-曰範圍,而;"信號的能量則大部分集中在低頻段。 因此’於低頻處可採用相對較大之第—調整參數以去除 雜=口如此可有較好地相對突出語音功率頻譜。例如可 二第門檻’當音框之上限頻率小於1咖時, =一調整參數設定為L5,當音框之上限頻率大於邮 時,則第一調整參數設定為〇.5。 、 此外’音框的信噪比值較低時,表 所以需要㈣㈣較大第二㈣參數n多修200941456 IX. Description of the Invention: [Technical Field] The present invention relates to a method for eliminating environmental noise in a voice signal, and more particularly to a technique for adjusting a noise intensity factor according to different environments to eliminate environmental noise. field. [Prior Art] ^ Noise is a sound that interferes with and hinders people from working, learning, and resting. When the signal number is in the process of transmission, it will be subject to some unwanted extra energy (ie, noise). Interference from noise typically causes distortion of the signal, often from outside or inside the system. For external noise sources, human noise is the most influential noise in speech recognition. Artificial noise sources such as steam, locomotive aircraft ignition systems, electric motors, switching equipment, high-voltage cable and electricity use ^ Fluorescent lamps such as arc discharges, such noise interference ranges from 0 to 600 MHz. There are many sources of internal noise, and the so-called thermal noise (Thermal Noise or White Noise). The source of such noise comes from the random movement of electrons inside the resistive element, and its intensity is proportional to the absolute temperature of the environment. For speech recognition systems, the sound produced by the internal components of the computer, such as fans, is the main factor that interferes with the accuracy of speech recognition. It is customary to estimate the noise-to-noise ratio by estimating the signal-to-noise ratio of the speech signal containing the ambient noise, and set the intensity factor 5 200941456 to be fixed in a specific frame. This setting assumes that the effects of noise on the entire spectrum of the voice signal are consistent. However, the noise in real life is mostly color noise, and the influence of color noise on the entire spectrum of the voice signal is not uniform. Some frequency bands of the voice signal are affected by noise more than other frequency bands. Therefore, the accuracy of signal identification is greatly reduced. In view of the problems of the prior art, in order to be able to solve the problem, the inventor has proposed a method for eliminating environmental noise in the voice signal based on years of research and development and many practical experiences, as an improvement of the disadvantages described above. Ways and basis. SUMMARY OF THE INVENTION In view of the above, it is an object of the present invention to provide a method for eliminating environmental noise in a voice signal to improve the adaptability of the speech recognition system to different environmental noises. According to the purpose of the present invention, a method for canceling ring-to-space noise in a voice signal is provided. The voice signal includes an environmental noise and a pure voice signal. The method includes the following steps: i) setting a voice signal Ii) calculating an upper limit frequency value and a signal to noise ratio (SNR) in the sound box; iii) determining a first adjustment parameter corresponding to one of the sound frames and a second according to the upper limit frequency value and the signal to noise ratio, respectively Adjusting the parameters; iv) using the first adjustment parameter and the second adjustment parameter to operate with a predetermined noise intensity factor to generate a corrected noise intensity 6 200941456 factor; and V) using a spectral subtraction and corrected The noise intensity factor is calculated to eliminate environmental noise from the voice signal. [Embodiment] 〇 First adjustment parameter and a second adjustment parameter 凊 Refer to FIG. 1 , which is a flow chart of steps of a method for eliminating environmental noise in a voice signal according to the present invention. The voice signal includes an environmental noise and a pure voice signal. The method includes the following steps. In step S10', the audio frame is first set in the voice frame, and then in the step, the upper limit frequency value in the sound box is detected. The signal-to-noise ratio (SNR) is connected, and in step S12, according to the upper limit frequency value and the signal-to-noise ratio, respectively, it is determined to correspond to the frame *y - <S〇r jnL. «_»_ because the general nature of the noise (car , street and vocal, etc. 高频 high frequency decrement' and its energy is often distributed throughout the frequency-曰 range, while; " signal energy is mostly concentrated in the low frequency band. Therefore 'a relatively large number can be used at low frequencies— Adjusting the parameters to remove the miscellaneous = mouth can have a relatively prominent speech power spectrum. For example, if the upper limit frequency of the sound box is less than 1 coffee, the adjustment parameter is set to L5, and the upper limit frequency of the sound box. If it is greater than the postal time, the first adjustment parameter is set to 〇.5. In addition, when the signal-to-noise ratio of the sound box is low, the table needs (4) (4) larger second (four) parameter n multi-repair
子’反之,音框的信,比二! 曰成伤較多’可採用相對較小的第二調整參數以J 7 200941456 得較小之已修正之雜訊強度因子。例如,當信噪比小於2 分貝(db)時,則第二調整參數即設定為1.6,信噪比大於 等於2分貝且小於8分貝時,則第二調整參數即設定為 1,信噪比大於等於8分貝且小於13分貝時,則第二調 整參數即設定為0.5,信噪比大於等於13分貝時,則第 二調整參數即設定為0.3。其中,上述上限頻率值範圍與 第一調整參數之對應關係,及信噪比範圍與第二調整參 數之對應關係可記錄於一查詢表,於步驟S12執行時使 〇用。 決定第一調整參數及第二調整參數後,在步驟S13 便利用此第一調整參數及此第二調整參數與一預設雜訊 強度因子進行權值運算,以產生一已修正之雜訊強度因 子。最後,步驟S14係使用一頻譜減法及此已修正之雜 訊強度因子進行運算以從語音訊號中消除該環境雜訊。 在此,頻譜減法係利用一能量譜公式以消除該環境 雜訊。由於頻譜減法為此技術領域者所熟知,在此不再 贅述,以下僅大略描述能量譜公式中與已修正之雜訊強 度因子之運算關係: if Ni(f)xM’ > Oi(f), 則更新 Ni(f) = Ni(f)xL+(l-L)xOi(f),且 Si(f) = 0 if Ni(f)xM’SOi(f) 貝|J Ni(f)保持不變,且 Si(f) = Oi(f)- Ni(f)xM, M’= Μχαχβ 8 200941456 〇1(f)係為語音訊號在第i個音框中之的頻譜振幅, Si(f)係為純語音訊號在第i個音框中之頻譜振幅 係為環境雜訊在第i個音框中之頻譜振幅,L係為平骨 因子,其值介於0〜1,M係為預設雜訊強度因子。M,值 係為已修正之雜訊強度因子,α係為第一調整參數 ,第二調整參數。α係根據第i個音框之上限頻率值而設 疋’ β係依據第i個音框之信噪比而設定。 -❹ 由上述方程式可知,環境雜訊之頻譜振幅與已修正 之雜,度因子進行運算後,如運算結果大於語音訊號 之頻譜振幅,則該語音訊號之頻譜振幅等於零,當純= 音訊號之頻譜振幅等於零時,則將i減去一平滑^子1 後,再與語音訊號之頻譜振幅進行運算,其所得之值再 與2境雜訊之頻譜振幅與平滑因子相乘後之值相加, 可得到更新後之環境雜訊之頻譜振幅。 另一方面,若環境雜訊之頻譜振幅與已修正之雜訊 〇 =度因子進行運算結果小於語音訊號之㈣振幅時則 t2訊叙賴振幅等於語音賴_減絲訊強度 因子與環境雜訊之頻譜振幅相乘後之值,在此, 更新環境雜訊之頻譜振幅。 本發明之消除環境雜訊方法係應用於一嵌入 之語音辨識系統,此嵌人式平台係可為-個人行^數ς 助理、手機、手提電腦及電子辭典等。請注意,立框之 2大:及音框之設定位置係依據環境變化或使;時機 不同而有所調整,並非定制,故在此不詳加敘述。 200941456 本發明之技術特徵在於可依據不同環境背景雜訊之 特性,利用第一調整參數及第二調整參數來調整一能量 譜公式之預設雜訊強度因子,以產生一已修正雜訊強度 因子,再將此已修正之雜訊強度因子代回此能量譜公式 以消除環境雜訊,其中,第一調整參數係根據一音框之 上限頻率值而設定,第二調整參數係依據此音框之信噪 比而設定。藉此加強語音辨識對不同環境噪音之適應 性,進一步提高辨識率。 ® 在一實施例中,α及β之設定值可由下述二式以具體 表達: (式一) α = 1.5 ,Fi ^ 1kHz =0.5 ,Fi > 1kHz (式二) β = 1.6 ,SNRi < 2db =1.0 ,2db S SNRi < 8db =0.5 ,8db $ SNRi < 13db =0.3 ,SNRi ^ 13db 其中, Fi表示第i音框上限頻率值 ,SNRi表示第i 音框信噪比。由上述設定值可知,在低頻段處去除環境 雜訊時轉用相對較大之雜訊強度因子,例如α=1.5,藉此 可相對突出語音的功率譜。另一方面,信噪比較低之音 框,表示雜訊成份較多,則採用相對較大的雜訊強度因 子,例如β= 1.6,信噪比較高之音框,表示語音成份較多, 則採用相對較小的雜訊強度因子,例如β=0.3。 請參閱第2Α至2Κ圖,其係係為多種不同環境雜訊 200941456 之頻譜-時間圖。第2A圖係為電腦主機運作產生之雜訊 頻譜-時間圖’第2B圖係為汽車行駛所產生之雜訊頻譜_ 時間圖,第2C圖係為衔道上所錄製之雜訊頻譜_時間圖, 第2D圖係為餐廳中所錄製之雜訊頻譜·時間圖,第2E圖 係為展覽會廠所錄製之雜訊頻譜-時間圖,第2F圖係為工 廠中所錄製之雜訊頻譜-時間圖,第2G圖係為人淚聲音 之雜訊頻譜-時間圖,第2H圖係為飛機上錄製之雜訊頻 譜-時間圖’第21圖係為地下鐵中錄製之雜訊頻譜_時間 Ο 圖,第2J圖係為火車上錄製之雜訊頻譜-時間圖。上述十 個圖之橫軸及縱轴分別代表時間及頻率,灰階較淡之部 分係代表較強的能量,由此些圖可較容易看出雜訊的穩 定度及頻率分佈情況’如:電腦主機雜訊、汽車雜訊、 街道雜訊、地下鐵雜訊及火車雜訊中,平均頻譜在低頻 處能量最咼’隨著頻率增加能量逐漸減少。而人聲雜訊、 工薇雜訊及餐廳雜訊頻譜特性大致跟上述五種類似,但 高頻及低頻能量的差距不似上述五種雜訊如此明顯。飛 〇 機雜訊頻譜除了在低頻處能量較高外,另外在 2500Hz〜3000Hz頻段亦有明顯峰值,展覽會場雜訊頻譜 則從100Hz左右才開始隨著頻率增加能量逐漸減少。 針對本案錄製之純語音訊號加入上述十種雜訊,其 參雜程度分別為5db、1 Odb、15db及20db,且本宰之參 數設定如下述(已作定點化處理): SNRi =5db 時,L=96,M=20 SNRi =l〇db 時,L=96,M=15 SNRi =15db 時,L=96,M=ll 200941456 SNRi = 20db 時,L=96 ,M=8 α = 15 ,Fi ^ 1kHz =5 , Fi > 1kHz β = 16 ,SNRi < 2db =10 ,2db $ SNRi < 8db =5 ,8db S SNRi < 13db =3 ,SNRi ^ 13db 依據本發明之消除環境雜訊之方法,其所求得之數 據與習知方法相比較如下表一至表十: 〇 加入電腦主機雜訊 SNR 習知 本發明 相對改善 clean 95.7% 95.7% 0% 99.3% 99.3% 0% 20db 95.7% 95.3% -0.4% 99.0% 99.3% 0.3% 15db 91.0% 93.7% 3.0% 98.7% 99.3% 0.6% lOdb 86.3% 90.0% 4.3% 96.0% 98.3% 2.4% 5db 70.7% 80.3% 13.6% 90.0% 94.3% 4.7% 表一 12 200941456 加入汽車雜訊 SNR 習知 本發明 相對改善 clean 95.7% 95.7% 0% 99.3% 99.3% 0% 20db 93.7% 94.3% 0.6% 99.3% 99.3% 0% 15db 91.3% 93.3% 2.2% 98.0% 99.0% 1.0% 10db 85.0% 89.0% 4.7% 95.0% 97.7% 2.8% 5db 61.7% 80.3% 30.1% 81.7% 94.0% 15.1% 加入人聲雜訊 SNR 習知 本發明 相對改善 clean 95.7% 95.7% 0% 99.3% 99.3% 0% 20db 93.0% 92.7% -0.3% 98.3% 99.3% 1.0% 15db 82.3% 87.3% 6.1% 94.0% 97.3% 3.5% lOdb 57.0% 73.0% 28.1% 80.7% 91.0% 12.8% 5db 24.7% 39.7% 60.7% 48.3% 66.7% 38.1% 表三 13 200941456 加入街道雜訊 SNR 習知 本發明 相對改善 clean 95.7% 95.7% 0% 99.3% 99.3% 0% 20db 96.0% 96.0% 0% 99.7% 99.7% 0% 15db 94.0% 95.3% 1.4% 99.0% 99.3% 0.3% 10db 89.0% 92.3% 3.7% 98.0% 99.3% 1.3% 05db 75.7% 83.7% 10.6% 92.7% 95.7% 3.2% 表四 加入飛機雜訊 SNR 習知 本發明 相對改善 clean 95.7% 95.7% 0% 99.3% 99.3% 0% 20db 94.3% 95.0% 0.7% 99.3% 99.3% 0% 15db 89.7% 93.3% 4.0% 97.3% 99.3% 2.1% lOdb 79.0% 84.3% 6.7% 94.3% 97.7% 3.6% 5db 52.7% 62.7% 19.0% 75.0% 86.3% 15.1% 表五 14 200941456 加入地下鐵雜訊 SNR 習知 本發明 相對改善 clean 95.7% 95.7% 0% 99.3% 99.3% 0% 20db 92.3% 92.7% 0.4% 97.7% 97.7% 0% 15db 87.7% 90.0% 2.6% 95.7% 96.3% 0.6% 10db 85.7% 87.3% 1.9% 94.7% 94.3% -0.4% 05db 77.3% 83.7% 8.3% 89.3% 92.0% 3.0% 表六 加入火車雜訊 SNR 習知 本發明 相對改善 clean 95.7% 99.3% 95.7% 99.3% 0% 0% 20db 94.3% 99.3% 94.0% 99.3% -0.3% 0% 15db 92.0% 98.3% 92.3% 99.0% 0.3% 0.7% lOdb 91.7% 98.7% 92.0% 98.0% 0.3% -0.7% 05db 80.7% 94.7% 87.3% 97.7% 8.2% 3.2% 表七 15 200941456The opposite, the letter of the sound box, than the second!曰 曰 ’ ’ ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” For example, when the signal-to-noise ratio is less than 2 decibels (db), the second adjustment parameter is set to 1.6, and the signal-to-noise ratio is greater than or equal to 2 decibels and less than 8 decibels, then the second adjustment parameter is set to 1, the signal-to-noise ratio is set. When the value is greater than or equal to 8 decibels and less than 13 decibels, the second adjustment parameter is set to 0.5, and when the signal-to-noise ratio is greater than or equal to 13 decibels, the second adjustment parameter is set to 0.3. The correspondence between the upper limit frequency value range and the first adjustment parameter, and the correspondence between the signal to noise ratio range and the second adjustment parameter may be recorded in a lookup table, which is used when executed in step S12. After determining the first adjustment parameter and the second adjustment parameter, in step S13, the first adjustment parameter and the second adjustment parameter are conveniently used to perform weight calculation with a preset noise intensity factor to generate a corrected noise strength. factor. Finally, step S14 operates using a spectral subtraction and the modified noise intensity factor to remove the ambient noise from the voice signal. Here, spectral subtraction uses an energy spectrum formula to eliminate this environmental noise. Since spectral subtraction is well known to those skilled in the art, it will not be repeated here. The following merely describes the operational relationship between the energy spectrum formula and the modified noise intensity factor: if Ni(f)xM' > Oi(f) , then update Ni(f) = Ni(f)xL+(lL)xOi(f), and Si(f) = 0 if Ni(f)xM'SOi(f) shell|J Ni(f) remains unchanged, And Si(f) = Oi(f)- Ni(f)xM, M'= Μχαχβ 8 200941456 〇1(f) is the spectral amplitude of the speech signal in the i-th sound frame, Si(f) is The spectral amplitude of the pure speech signal in the i-th sound frame is the spectral amplitude of the environmental noise in the i-th sound frame, and the L is the flat bone factor, the value is between 0 and 1, and the M system is the preset impurity. Signal strength factor. M, the value is the corrected noise intensity factor, α is the first adjustment parameter, and the second adjustment parameter. The α system is set based on the upper limit frequency value of the i-th sound frame. The β system is set according to the signal-to-noise ratio of the i-th sound frame. - ❹ From the above equation, the spectral amplitude of the environmental noise is calculated after the corrected noise and the degree factor. If the operation result is larger than the spectral amplitude of the voice signal, the spectral amplitude of the voice signal is equal to zero, when pure = audio signal When the amplitude of the spectrum is equal to zero, the smoothing of the sub-subsequence 1 is subtracted from i, and then the spectral amplitude of the speech signal is calculated, and the obtained value is added to the value obtained by multiplying the spectral amplitude of the 2-channel noise with the smoothing factor. , the spectral amplitude of the updated environmental noise can be obtained. On the other hand, if the spectral amplitude of the ambient noise and the corrected noise 度=degree factor are less than the amplitude of the (4) amplitude of the voice signal, then the amplitude of the t2 signal is equal to the voice _ _ minus the intensity factor and the environmental noise. The value of the spectral amplitude is multiplied, where the spectral amplitude of the ambient noise is updated. The method for eliminating environmental noise according to the present invention is applied to an embedded speech recognition system, and the embedded platform can be an individual, a mobile phone, a laptop computer, and an electronic dictionary. Please note that the 2 large frame: and the setting position of the sound box are adjusted according to the environment or the timing; the timing is different, it is not customized, so it is not described here. The technical feature of the present invention is that the first noise parameter and the second adjustment parameter can be used to adjust the preset noise intensity factor of an energy spectrum formula according to the characteristics of different environmental background noises to generate a modified noise intensity factor. And then the corrected noise intensity factor is returned to the energy spectrum formula to eliminate environmental noise, wherein the first adjustment parameter is set according to the upper limit frequency value of a sound box, and the second adjustment parameter is based on the sound frame. Set by the signal to noise ratio. In this way, the adaptability of speech recognition to different environmental noises is enhanced, and the recognition rate is further improved. In one embodiment, the set values of α and β can be expressed by the following two formulas: (Formula 1) α = 1.5, Fi ^ 1kHz = 0.5, Fi > 1kHz (Formula 2) β = 1.6, SNRi < 2db =1.0 , 2db S SNRi < 8db =0.5 , 8db $ SNRi < 13db =0.3 , SNRi ^ 13db where Fi represents the upper frequency value of the i-th frame and SNRi represents the signal-to-noise ratio of the i-th frame. It can be seen from the above set values that a relatively large noise intensity factor, for example, α = 1.5, is used when the environmental noise is removed at the low frequency band, whereby the power spectrum of the speech can be relatively emphasized. On the other hand, if the signal frame with low signal-to-noise ratio indicates that there are many noise components, a relatively large noise intensity factor is used, for example, β=1.6, and the signal frame with high signal-to-noise ratio indicates that the voice component is more. , using a relatively small noise intensity factor, such as β = 0.3. Please refer to Figures 2 to 2 for a spectrum-time diagram of various environmental noises 200941456. Figure 2A is the noise spectrum generated by the operation of the host computer - time chart 'Block 2B is the noise spectrum generated by the car _ time map, the 2C picture is the noise spectrum recorded on the track _ time map The 2D picture is the noise spectrum and time chart recorded in the restaurant, the 2E picture is the noise spectrum-time chart recorded by the exhibition factory, and the 2F picture is the noise spectrum recorded in the factory - Time map, the 2G picture is the noise spectrum of the human tears - time chart, the 2H picture is the noise spectrum recorded on the aircraft - time chart '21st picture is the noise spectrum recorded in the subway _ time Ο Picture, the 2J picture is the noise spectrum-time diagram recorded on the train. The horizontal and vertical axes of the above ten graphs represent time and frequency, respectively. The lighter gray fractions represent stronger energy, so these graphs can easily see the stability and frequency distribution of noises. In computer host noise, car noise, street noise, subway noise, and train noise, the average spectrum energy is the lowest at low frequencies. The spectrum characteristics of vocal noise, Gongwei noise and restaurant noise are similar to the above five, but the difference between high frequency and low frequency energy is not as obvious as the above five kinds of noise. In addition to the high energy at low frequencies, the noise spectrum of the flying machine also has obvious peaks in the frequency range of 2500 Hz to 3000 Hz. The noise spectrum of the exhibition field starts to decrease from 100 Hz. Adding the above-mentioned ten kinds of noises to the pure voice signals recorded in this case, the doping levels are 5db, 1 Odb, 15db and 20db, respectively, and the parameters of the slaughter are set as follows (has been fixed): SNRi =5db, L=96, M=20 SNRi = l〇db, L=96, M=15 SNRi =15db, L=96, M=ll 200941456 SNRi = 20db, L=96, M=8 α = 15 Fi ^ 1kHz =5 , Fi > 1kHz β = 16 , SNRi < 2db =10 , 2db $ SNRi < 8db =5 , 8db S SNRi < 13db =3 , SNRi ^ 13db According to the invention, the environment noise is eliminated The method and the obtained data are compared with the conventional methods as shown in Tables 1 to 10 below: 〇 Adding computer host noise SNR Conventional invention is relatively improved clean 95.7% 95.7% 0% 99.3% 99.3% 0% 20db 95.7% 95.3% -0.4% 99.0% 99.3% 0.3% 15db 91.0% 93.7% 3.0% 98.7% 99.3% 0.6% lOdb 86.3% 90.0% 4.3% 96.0% 98.3% 2.4% 5db 70.7% 80.3% 13.6% 90.0% 94.3% 4.7% Table 1 12 200941456 Adding car noise SNR Conventional invention is relatively improved clean 95.7% 95.7% 0% 99.3% 99.3% 0% 20db 93.7% 94.3% 0.6% 99.3% 99.3% 0% 15db 91.3% 93.3% 2.2% 98.0% 99.0% 1.0% 10db 85.0% 89.0% 4.7% 95.0% 97.7% 2.8% 5db 61.7% 80.3% 30.1% 81.7% 94.0% 15.1% Adding vocal noise SNR Conventional improvement of the invention Clean 95.7% 95.7% 0% 99.3% 99.3% 0% 20db 93.0% 92.7% -0.3% 98.3% 99.3% 1.0% 15db 82.3% 87.3% 6.1% 94.0% 97.3% 3.5% lOdb 57.0% 73.0% 28.1% 80.7% 91.0 % 12.8% 5db 24.7% 39.7% 60.7% 48.3% 66.7% 38.1% Table 3 13 200941456 Join the street noise SNR Conventional invention is relatively improved clean 95.7% 95.7% 0% 99.3% 99.3% 0% 20db 96.0% 96.0% 0 % 99.7% 99.7% 0% 15db 94.0% 95.3% 1.4% 99.0% 99.3% 0.3% 10db 89.0% 92.3% 3.7% 98.0% 99.3% 1.3% 05db 75.7% 83.7% 10.6% 92.7% 95.7% 3.2% Table 4 joined the aircraft Noise SNR Conventional invention relatively improved clean 95.7% 95.7% 0% 99.3% 99.3% 0% 20db 94.3% 95.0% 0.7% 99.3% 99.3% 0% 15db 89.7% 93.3% 4.0% 97.3% 99.3% 2.1% lOdb 79.0 % 84.3% 6.7% 94.3% 97.7% 3.6% 5db 52.7% 62.7% 19.0% 75.0% 86.3% 15.1% Table 5 14 200941456 Joining Subway Noise SNR Convention Relative improvement of invention 95.7% 95.7% 0% 99.3% 99.3% 0% 20db 92.3% 92.7% 0.4% 97.7% 97.7% 0% 15db 87.7% 90.0% 2.6% 95.7% 96.3% 0.6% 10db 85.7% 87.3% 1.9% 94.7 % 94.3% -0.4% 05db 77.3% 83.7% 8.3% 89.3% 92.0% 3.0% Table 6 Adding train noise SNR Conventional invention is relatively improved clean 95.7% 99.3% 95.7% 99.3% 0% 0% 20db 94.3% 99.3% 94.0% 99.3% -0.3% 0% 15db 92.0% 98.3% 92.3% 99.0% 0.3% 0.7% lOdb 91.7% 98.7% 92.0% 98.0% 0.3% -0.7% 05db 80.7% 94.7% 87.3% 97.7% 8.2% 3.2% Seven 15 200941456
加入工廠雜訊 SNR 習知 本發明 相對改善 clean 95.7% 95.7% 0% 99.3% 99.3% 0% 20db 93.3% 94.0% 0.8% 99.3% 99.3% 0% 15db 87.3% 93.7% 7.3% 98.3% 98.7% 0.4% 10db 74.3% 81.7% 10.0% 91.0% 97.3% 6.9% 05db 44.7% 58.0% 29.8% 68.3% 79.7% 16.7% 表八 加入餐靡雜訊 SNR 習知 本發明 相對改善 clean 95.7% 99.3% 95.7% 99.3% 0% 0% 20db 91.7% 99.0% 93.7% 98.7% 2.2% -0.3% 15db 78.3% 95.0% 86.3% 98.0% 10.2% 3.2% lOdb 59.3% 81.0% 70.7% 88.3% 19.2% 9.0% 05db 28.7% 53.0% 42.3% 68.0% 47.4% 28.3% 表九 16 200941456 加入展覽會場雜訊 SNR 習知 本發明 相對改善 clean 95.7% 95.7% 0% 99.3% 99.3% 0% 20db 88.7% 88.7% 0% 98.7% 98.7% 0% 15db 79.3% 82.7% 4.3% 93.3% 95.3% 2.1% 10db 63.3% 73.3% 15.8% 85.7% 89.3% 4.2% 05db 33.7% 48.0% 42.4% 62.0% 76.3% 23.1% 表十 由上表即可得知本發明與習知相對改善之量,其 值如下: 信噪比 (SNR) 相對改善 20db 1.0% 0.1% 15db 4.1% 1.5% 10db 9.5% 4.2% 05db 27.0% 15.1% 本方法先行偵測特定音框之上限頻率值及其信噪 比,並依據偵測結果調整預設之雜訊強度因子,再將已 修正之雜訊強度因子代回能量譜公式以消除語音訊號之 環境雜訊。經由上述實施結果,印證了本發明所提出之 消除環境雜訊之方法在適應於一般自然環境雜訊時,其 17 200941456 效能較習知方法為佳,尤其在低信噪比時更能有效抵抗 背景環境雜訊的干擾。 消除環境雜訊之方法係應用於一嵌入式平台之語音 辨識系統,提供如語音導航等功能,且此嵌入式平台係 可為一個人行動數位助理、手機、手提電腦及電子辭典 等。加入環境雜訊時,本發明係以純語音訊號長度為基 準,依序選擇環境雜訊與純語音訊號相同長度作相加的 動作,當所選擇的環境雜訊長度若不足純語音訊號相同 ® 長度時,係從頭循環依序與純語音訊號相同長度作相加 ' 的動作。請注意,音框之取樣大小及音框之設定位置係 依據環境變化或使用時機不同而有所調整,並非定制, 故在此不詳加敛述。 以上所述僅為舉例性,而非為限制性者。任何未脫 離本發明之精神與範疇,而對其進行之等效修改或變 更,均應包含於後附之申請專利範圍中。 ❹ 【圖式簡單說明】 第1圖係為本發明之消除語音訊號中之環境雜訊 之方法之步驟流程圖;以及 第2A至2J圖係為多種不同環境雜訊之頻譜·•時間 圖。 【主要元件符號說明】 S10-S14 :步驟流程。 18Adding factory noise SNR Conventional invention relatively improved clean 95.7% 95.7% 0% 99.3% 99.3% 0% 20db 93.3% 94.0% 0.8% 99.3% 99.3% 0% 15db 87.3% 93.7% 7.3% 98.3% 98.7% 0.4% 10db 74.3% 81.7% 10.0% 91.0% 97.3% 6.9% 05db 44.7% 58.0% 29.8% 68.3% 79.7% 16.7% Table 8 Adding meal noise SNR Conventional invention relatively improved clean 95.7% 99.3% 95.7% 99.3% 0 % 0% 20db 91.7% 99.0% 93.7% 98.7% 2.2% -0.3% 15db 78.3% 95.0% 86.3% 98.0% 10.2% 3.2% lOdb 59.3% 81.0% 70.7% 88.3% 19.2% 9.0% 05db 28.7% 53.0% 42.3% 68.0% 47.4% 28.3% Table IX16 200941456 Joined the exhibition venue noise SNR Conventional invention relatively improved clean 95.7% 95.7% 0% 99.3% 99.3% 0% 20db 88.7% 88.7% 0% 98.7% 98.7% 0% 15db 79.3 % 82.7% 4.3% 93.3% 95.3% 2.1% 10db 63.3% 73.3% 15.8% 85.7% 89.3% 4.2% 05db 33.7% 48.0% 42.4% 62.0% 76.3% 23.1% Table 10 From the above table, we can know the invention and the habit Knowing the relative improvement, the values are as follows: Signal-to-Noise Ratio (SNR) Relative improvement 20db 1.0% 0.1% 15db 4.1% 1.5% 10db 9.5% 4.2% 05db 27.0% 15.1% Firstly, the upper limit frequency value of the specific sound box and its signal-to-noise ratio are detected, and the preset noise intensity factor is adjusted according to the detection result, and the corrected noise intensity factor is returned to the energy spectrum formula to eliminate the environment of the voice signal. Noise. Through the above implementation results, it is confirmed that the method for eliminating environmental noise proposed by the present invention is better than the conventional method when it is adapted to the general natural environment noise, and is more effective than the low signal to noise ratio. Interference in the background environment noise. The method of eliminating environmental noise is applied to a voice recognition system of an embedded platform, such as voice navigation, and the embedded platform can be a person mobile digital assistant, mobile phone, laptop computer and electronic dictionary. When adding environmental noise, the present invention selects the same length of the ambient noise and the pure voice signal according to the length of the pure voice signal, and selects the same length of the ambient noise as the pure voice signal is the same as the length. At the same time, the loop is sequentially added to the same length as the pure voice signal. Please note that the sampling size of the frame and the setting position of the frame are adjusted according to the environment change or the timing of use. It is not customized, so it is not detailed here. The above is intended to be illustrative only and not limiting. Any changes or modifications to the spirit and scope of the present invention are intended to be included in the scope of the appended claims. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a flow chart showing the steps of a method for eliminating environmental noise in a voice signal according to the present invention; and FIGS. 2A to 2J are spectrum and time charts of various environmental noises. [Main component symbol description] S10-S14: Step flow. 18