TWI242729B

TWI242729B - Speech database establishment and recognition method and system thereof

Info

Publication number: TWI242729B
Application number: TW93101136A
Authority: TW
Inventors: Li-Lu Chen
Original assignee: Micro Star Int Co Ltd
Priority date: 2004-01-16
Filing date: 2004-01-16
Publication date: 2005-11-01
Also published as: TW200525384A

Abstract

The invention provides a speech database establishment and recognition method and system thereof. The method includes enabling a word segmentation module to divide a speech signal inputted by a user through an input unit into at least a vowel speech module in accordance with a rule pre-defined by the user and to store the vowel speech module in a database through a storage module; enabling the storage module to store a vowel speech module arrangement sequence corresponding to the speech signal inputted by the user; enabling a speech recognition module to divide the speech signal into at least a vowel speech module to be recognized in accordance with the rule pre-defined by the user while the user inputs the speech signal through the input unit; enabling the speech recognition module to search if there is a match with arrangement sequence data of the vowel speech module in the database, retrieving the arrangement sequence data if yes, and listing a possible combination matched with the vowel speech module arrangement sequence if no. With the voice database establishment and recognition method and the word segmentation mechanism of the system, it is able to provide a concise speech recognition database structure and a speech recognition method and system varied according to the feature of user.

Description

1242729 五、發明說明（1) 【發明所屬之技術領域】一種語音資料庫建立與辨識方法以及系統，更詳而言之，係有關於一種透過語詞分割技術提升語音訓練與辨識效率之方法與系統。【先前技術】隨著電子資訊產業發展的日新月異，各種功能強大且價格低廉的消費性電子資訊產品紛紛問世，就以其中最為普遍的電腦而言，由於各種軟體以及硬體在功能上不斷的加強，相對的也讓電腦能夠處理的工作已不再像以往一般只限於程式運作或是資料處理，而是扮演著一個影像音聲傳播媒介的角色。要言之，電腦已經從公司或實驗室走向家庭電器產品的領域中。不單於電腦方面是如此，在另一方面，生活週遭的各種電氣化產品也越來越強調電腦化。透過各種嵌入式系統，如電視機、電冰箱或洗衣機等電氣化產品，已經漸漸的具有小型電腦之功能。換言之，使用者透過簡單的人機介面即可設定操作不同的功能選項。更進一步者，使用者除單向的設定操作外，尚能與該電氣化產品進行溝通，甚至與外界藉由電子郵件等方式聯絡。是則，以往單純的家電產品也電腦化’而往育豕電的方向發展。承前所述，不論是電腦家電化或是家電電腦化，使用者都必須透過人機介面與機器溝通，以輸入單元為例，其中最常用的莫過於鍵盤按鈕、滑鼠或其他類似的輸入單元。雖然該些輸入單元可以提供使用者輸入設定操作時所1242729 V. Description of the invention (1) [Technical field to which the invention belongs] A method and system for establishing and identifying a speech database, more specifically, a method and system for improving speech training and identification efficiency through word segmentation technology . [Previous technology] With the rapid development of the electronic information industry, various powerful and low-priced consumer electronic information products have come out. As for the most common computers, various software and hardware have continued to strengthen their functions. In contrast, the work that the computer can handle is no longer limited to program operation or data processing as before, but plays a role of a media of video and audio transmission. In other words, computers have moved from companies or laboratories into the field of home appliances. This is not only the case with computers. On the other hand, the electrification products around life are increasingly emphasizing computerization. Through various embedded systems, such as televisions, refrigerators, or washing machines, they have gradually become small computers. In other words, the user can set and operate different function options through a simple human-machine interface. Furthermore, in addition to the one-way setting operation, the user can still communicate with the electrified product, and even communicate with the outside world via email. Yes, in the past, simple home electrical products were also computerized 'and developed in the direction of Yudai Electric. According to the previous description, whether it is a computerized home appliance or a computerized home appliance, the user must communicate with the machine through a human-machine interface. Taking the input unit as an example, the most commonly used is a keyboard button, mouse, or other similar input unit. . Although these input units can provide users with

17458 微星.ptd 第5頁 1242729 五、發明說明（2) 需的指令或資的體積對於講在；其次，使傳統的輸入方電腦家電化或為解決此擇的輸入方式然可以大幅減者只需如對人行溝通，對於是欲透過語音語音資料庫以中華明國語音學習系統測使用者所輸料，但是其仍有求輕薄短小的設用者未必為一熟式與電腦溝通有家電電腦化均為一障礙不方便之處，例如輸入單元計觀而言通常是困難點所黯電腦人士，抑或是其透過所困難。以上種種對於落實一問題，以語音，只需一個如麥少產品溝通般不黯電當作輸及一有公告第及其方入的學一用以辨認輸入的學習例句比較的符合率習例句的語音以訓練訓練裝置。經模型幾已涵蓋時，能有效的輸入信號。前述的語系統所習用之過一組所有本依據該的體積及的以口語腦操作的入媒介，效率的辨 3 0 8 6 6 6號法」，其習例句的習例句的之辨認裝使用者的學習例句身的語音語音模型輸入代克風般所占用方式說使用者首先必識系統專利揭技術特語音信語音至置，以語音模之訓練特性，内之語替傳統文字或圖像選的聲音輸入單元，顯的空間。再者，使用出指令即得與機器進而言亦頗為方便。但須有一個資料豐富的露一種徵在於號之特計算其及一藉型並更後，該致使在音特性「智慧經由機徵參數辨認結由使用新其中使用者正式上辨認使型國語器先偵後，經果與學者如學資料之的語音線使用用者的音學習與辨識系統及方法係為現今語音辨識技術。然其卻存在著相當大的缺點，亦即使17458 MSI.ptd Page 5 1242729 V. Description of the Invention (2) The volume of required instructions or resources is important; secondly, the traditional input side computer appliances or the input method to solve this choice can be greatly reduced. Need to communicate with the people, for users who want to use the voice database to test the input of the Chinese Mingguo Voice Learning System, but it still requires light, short and small users may not be familiar with the computer with a home appliance computer Conversion is an inconvenience. For example, in terms of the input unit, it is usually difficult for the computer person to dim the computer, or it is difficult for them to pass through. For the implementation of the above problems, using voice, only one that is not as dark as the communication of Mai Shao ’s products is used as a loser, and there is a bulletin and its learning, which is used to identify the input. Voice to training training device. When the model is almost covered, it can effectively input the signal. The above-mentioned language system has used a set of all the media that are based on the volume and the oral brain operation, to identify the method 3 0 8 6 6 6 ", the identification and use of example sentences The user ’s learning example of the body ’s voice and voice model is input in a gram-like manner. The user must first recognize the system ’s patented technology, and the special feature of the voice message. With the training characteristics of the voice model, the internal language is used for traditional text or images. Selected sound input unit, display space. Furthermore, it is convenient to use the instruction to enter the machine. But there must be a wealth of information to reveal the characteristics of the number and the calculation of a borrowing type and later, which should result in the "characteristics of wisdom" identified by the mechanical parameters, and the use of the new among which the user officially recognizes the type of Mandarin. After investigating first, the sound learning and recognition systems and methods of the user using the voice line of the economic and academic materials are the current speech recognition technology. However, it has considerable shortcomings.

國圓Guoyuan

17458 微星.ptd 第6頁 1242729 五、發明說明（3) 用者必須先依以建立使用者時養成用清晰特徵建立及識不但欠缺人性反覆多次的嘗若有變更則必率將下降。又，習知 Model； HMM) 型之數量及内内容後，再輸立。而另一種 DTW)來進行語的完整語音資換言之，使用識的語音數量則勢必要建立前述隱藏式馬綜上所述建立與辨識方【發明内容】為解決上提供一種語音據接近預定之標準速度與音量朗讀例句，藉的語音特徵俾降低系統辨識錯誤之機會，同穩定的朗讀方式輸入語音的習慣。此種語音別的方式要求使用者遷就機器的識別習慣，化’對於反應較不敏捷的使用者而言則必須 ϋ式才能求得較佳的辨識效果。此外，使用者須重新調適（ad just)使用者特徵，否則辨識利用隱藏式馬可夫模型（Hidden Markov 作為語音識別的判斷基準，其缺點在於其模容係預先設定的，當使用者設定模型數量及八符合該些模型之語音資料以完成模型之建動態時間校正法（D y n a m i c T i m e W a r p i n g ; 音的辨識之技術，則係以使用者預先所輸入料作為比對基準，其並無所謂模組之概念。者輪入的資料數及其内容即決定其所能夠辨及其内容，一旦要求達到一定的辨識程度，相當龐大的資料庫，同樣的情形亦會發生在可夫模型語音識別技術中。 ’如何能夠提供一種更有效率的語音資料庫法以及系統，遂成為目前亟待解決之課題。述習知技術之缺點’本發明之主要目的在於資料庫建立與辨識方法以及系統，透過語詞17458 MSI.ptd Page 6 1242729 V. Description of the invention (3) The user must first establish a user with clear features to establish and recognize, not only lacking in humanity, repeated trial and error, if there are changes, the rate will decline. Also, learn about the number and content of Model (HMM) models before you enter. And another DTW) is used to complete the speech of the language. In other words, the amount of recognized speech is bound to establish the above-mentioned hidden horse comprehensive establishment and identification party. [Summary of the invention] To solve the problem of providing a speech data close to predetermined standards Speed and volume read example sentences, borrowed features of the voice 俾 reduce the chance of the system to identify errors, and the same habit of entering speech with stable reading. This other way of speech requires the user to adapt to the recognition habits of the machine. For users with less agility, it is necessary to use a formula to obtain better recognition results. In addition, the user must re-adjust the user's characteristics, otherwise Hidden Markov is used as the judgment criterion for speech recognition. The disadvantage is that the module is preset. When the user sets the number of models and Eight. The speech data that conforms to these models to complete the model. Dynamic Time Correction (Dynamic T ime W arping; the technology of sound identification, based on the user's input in advance as a comparison benchmark, there is no so-called module The number of data and content of the person's rotation determines its ability to distinguish and its content. Once a certain degree of recognition is required, a very large database, the same situation will also occur in the Kuff model speech recognition technology "How to provide a more efficient speech database method and system has become a problem to be solved urgently. Describe the shortcomings of the conventional technology" The main purpose of the present invention is to establish a database and identification method and system, through words

17458 微星.ptd 第7 ! 1242729 五、發明說明（4) 分割機制，得以增加資料庫之樣本數量，俾增加語音訓練與辨識成功之機率。本發明之另一目的在於提供一種語音資料庫建立與辨識方法以及系統，透過語詞分割機制，使用者無須自始重複學習例句之發音速度、頻率及/或語調，故得於使用語音辨識前節省建立個人語音特徵之時間。本發明之再一目的在於提供一種語音資料庫建立與辨識方法以及系統，透過語詞組合機制，得將一定數量範圍之語音資料加以排列組合成複雜之語詞組合，故得節省大量的資料庫資料量。本發明之再一目的在於提供一種語音資料庫建立與辨識方法以及系統，透過語詞分割機制，縱然使用者之發音未符標準，仍能獲得相當接近之辨識結果。為達成以上所述及其他目的，本發明之語音資料庫建立與辨識系統包括有：一語詞分割模組，其係用以將使用者透過一輸入單元所輸入之語音訊號，依據使用者預設之基準將該語音訊號分割成至少一母語音模組，並將該母語音模組儲存於一資料庫中；一儲存模組，其係用以將該語詞分割模組所分割出之該至少一母語音模組，以及對應該輸入訊號之母語音模組排列順序儲存至該資料庫中；一語音辨識模組，其係用以於使用者透過該輸入單元輸入語音訊號時，依據使用者預設之基準將該語音訊號分割成至少一待辨識母語音模組，並搜尋該資料庫中是否有允符該待辨識母語音模組排列順序資料，若有，則擷取出該排列順17458 MSI.ptd 7th! 1242729 V. Description of the invention (4) The segmentation mechanism can increase the number of samples in the database and increase the probability of successful speech training and recognition. Another object of the present invention is to provide a method and system for establishing and recognizing a speech database. Through the word segmentation mechanism, users do not need to repeatedly learn the pronunciation speed, frequency, and / or intonation of example sentences from the beginning, so they can save before using speech recognition. Time to establish personal voice characteristics. Another object of the present invention is to provide a method and system for establishing and identifying a voice database. Through a word combination mechanism, a certain amount of voice data can be arranged and combined into a complex word combination, so a large amount of database data can be saved. . Yet another object of the present invention is to provide a method and system for establishing and recognizing a voice database. Through the word segmentation mechanism, even if the user's pronunciation does not meet the standard, a fairly close recognition result can still be obtained. In order to achieve the above and other objectives, the speech database establishment and identification system of the present invention includes: a word segmentation module, which is used to convert the voice signal input by the user through an input unit according to the user's preset Based on the reference, the voice signal is divided into at least one mother voice module, and the mother voice module is stored in a database; a storage module is used to divide the at least the word division module into the at least one A mother voice module and the arrangement order of the mother voice module corresponding to the input signal are stored in the database; a speech recognition module is used to input a voice signal through the input unit according to the user The preset standard divides the voice signal into at least one mother voice module to be identified, and searches the database for data that allows the arrangement order of the mother voice module to be identified, and if so, retrieves the arrangement order.

17458 微星.ptd 第8頁 1242729 五、發明說明（5) 序資料；若否，則列出該允符該母語音模組排列順序之可能組合。透過該語音資料庫建立與辨識系統，執行語音訓練與辨識的方法係：首先，令該語詞分割模組將使用者透過一輸入單元所輸入之語音訊號，依據使用者預設之基準將該語音訊號分割成至少一母語音模組，並透過一儲存模組將該母語音模組儲存於一資料庫中；其次，令該儲存模組將對應該使用者所輸入之語音訊號的母語音模組排列順序儲存於該資料庫中；接著，令該語音辨識模組於使用者透過該輸入單元輸入語音訊號時，依據使用者預設之基準將該語音訊號分割成至少一待辨識母語音模組；再者，令該語音辨識模組模組搜尋該資料庫中是否有允符該待辨識母語音模組排列順序資料，若有，則擷取出該排列順序資料；若否，則列出該允符該母語音模組排列順序之可能組合。相較於習知之語音訓練與辨識技術，本發明之語音資料庫建立與辨識方法以及系統，除得以增加資料庫之樣本數量，俾增加語音訓練與辨識成功之機率外，復得節省建立個人語音特徵之時間。此外，縱使使用者之發音未符標準，仍能獲得相當接近之辨識結果以增加辨識成功之機率。【實施方式】以下係藉由特定的具體實施例說明本發明之實施方式，熟悉此技藝之人士可由本說明書所揭示之内容輕易地瞭解本發明之其他優點與功效。本發明亦可藉由其他不同17458 MSI.ptd Page 8 1242729 V. Description of the invention (5) Sequence information; if not, list the possible combinations that allow the arrangement order of the parent voice module. The method for establishing and recognizing a system through the voice database and performing voice training and recognition is as follows: First, the word segmentation module is configured to convert the voice signal input by the user through an input unit and the voice according to a preset reference of the user. The signal is divided into at least one mother voice module, and the mother voice module is stored in a database through a storage module; secondly, the storage module will correspond to the mother voice module of the voice signal input by the user The arrangement order of the groups is stored in the database; then, when the user inputs a voice signal through the input unit, the voice recognition module divides the voice signal into at least one mother voice mode to be recognized according to a preset reference of the user. Furthermore, the voice recognition module module is caused to search the database for the arrangement order data of the mother voice module to be identified, and if so, retrieve the arrangement order data; if not, list them This permits a possible combination of the arrangement order of the mother voice module. Compared with the conventional voice training and recognition technology, the method and system for establishing and recognizing a voice database of the present invention can save the number of samples in the database and increase the probability of successful voice training and recognition. Feature time. In addition, even if the user's pronunciation is not up to standard, a fairly close recognition result can still be obtained to increase the chance of successful recognition. [Embodiment] The following describes the embodiment of the present invention through specific embodiments. Those skilled in the art can easily understand other advantages and effects of the present invention from the content disclosed in this specification. The present invention can also be modified by other

17458微星.ptd 第9頁 1242729 ---—__ 五、發明說明（6)^ ~~-—____ 白勺 #骨杳 # ^ 可基‘：二例加以施行或應用，本說明書中的各項細節亦、+同觀點與應用，在不悖離本發明之籍妯π % ^ 種修飾與變更。 S 5 <積神下進行各建立第1圖’於本實施例中’本發明之語音資料庫者透過統係應用於一個人電腦1中，用以提供使用 ,η 本餐明之語音資料庫建立與辨識方法以及夺统盘，书細1進行諸如操作及/或設定等溝通。需特別說明 r〗’洁係穴本發明之語音資料庫建立與辨識系統以及該個人電 ^貫/祭之系統軟硬體架構更為複雜，為突顯本發明之技 7特彳政所在’故僅顯示論述與本發明之技術特徵相關之部 t °又’本發明之語音資料庫建立與辨識方法以及系統復传應用於工作站 '筆記型電腦、液晶電腦、平板電腦、掌上型電腦、個人數位助理以及行動電話等其中之一者。本發明之語音資料庫建立與辨識系統至少包括：一輸入單元1 0、一語詞分割模組1 2、一資料庫1 4、一儲存模組 1 6以及一語音辨識模組1 8。該輸入單元丨〇，其係用以提供使用者輸入語音訊號至該語音資料庫建立與辨識系統中之具有集音功能之單元，於本戶、加例中’其係為一麥克風（in i c r 〇 p h ο n e)。该語詞分割模組1 2，其係用以將使用者透過該輸入單凡1 0所輸入之語音訊號，依據使用者預設之基準將該語音訊號分割成至少一母語音模組。於本實施例中，該語詞分釗杈組1 2復包括一類比數位換單元（未圖示），用以將使用者所輸入之類比語音訊號轉換成數位訊號，因此，當使17458 微星 .ptd Page 9 1242729 -----__ V. Description of the invention (6) ^ ~~ -—____ 白 hotel # 骨杳 # ^ 可基 ': Two examples are implemented or applied, each item in this specification The details are the same, and the same viewpoints and applications do not deviate from the modifications of the present invention. S 5 < Each one is created under the accumulation of the first figure 'in this embodiment' The voice database of the present invention is applied to a personal computer 1 through the system to provide use, η The voice database of this meal is established Communicate with the identification method and the control panel, Book 1 such as operation and / or setting. It should be specially explained that "'Jie Xi Acupoint', the voice database establishment and identification system of the present invention, and the software and hardware architecture of the personal electricity / transmission / sacrifice system are more complicated, in order to highlight the technology of the present invention. Only the parts related to the technical features of the present invention will be shown t °, and the method of establishing and recognizing the voice database of the present invention and the system retransmission are applied to workstations' notebook computers, LCD computers, tablet computers, palmtop computers, personal digital Assistants and mobile phones. The speech database establishment and recognition system of the present invention includes at least: an input unit 10, a word segmentation module 12, a database 14, a storage module 16, and a speech recognition module 18. The input unit 丨〇 is used to provide a user to input a voice signal into the voice database establishment and identification system with a sound collection function unit, in this example, it is a microphone (in icr 〇ph ο ne). The word segmentation module 12 is used to divide the voice signal input by the user through the input list 10, and divide the voice signal into at least one mother voice module according to the preset reference of the user. In this embodiment, the word division group 1 2 includes an analog digital conversion unit (not shown) for converting an analog voice signal input by a user into a digital signal. Therefore, when using

17458微星.ptd17458 MSI.ptd

第〗〇頁 1242729 五、發明說明（7) 用者於建立語音資料庫時，得透過該輸入單元1 0輸入一組語詞「今天天氣很好」的類比語音訊號時，該語詞分割模組1 2隨即將該轉換成數位訊號加以處理。於完成數位訊號格式之轉換後，該語詞分割模組1 2隨即將該組語詞「f at」，依據使用者所設定語音分割基準，進行該組與詞之分割。於本實施例中，本發明之語詞分割模組係分析語音訊號在頻譜上的分布關係。要言之，當使用者透過該輸入單元1 0輸入由口中所發出之語音時，得經過時域轉頻之運算 (傅立葉轉換）以得到語音訊號在頻譜上的資料，該原始資料至少包括頻率、能量以及時間的關係，在某一時間點 t附近之時間點（…t - 2、t - 1、t + 1、t + 2···)得到每個頻率上的能量資料，藉由計算其平均數和相關係數，以取得相互間之差異性。此外，在「頻率」與「時間」的二維數據裡，利用二維影像的邊緣偵測原理，以得到兩不相似語音片段之分界，再使用可變動之門檻值，此門檻值會因語音資料和環境的不同而有所變更，藉以鑑別出某一時間點與另一時間點在頻率上的能量變化有顯著且超出門檻值的表現，俾作為分割語詞之依據。又，在分割線與分割線之間即得為相似的母語音模組。換言之，在某一組語詞資料輸入後，經過前述之語詞分割技術之計算與處理，即可得到至少一母語音模組。承前所述，於本實施例中，使用者所輸入之該組語詞得被分割為「f」、「a」以及「t」等三個部分。於本實Page 〖〇1242729 V. Description of the invention (7) When the user sets up a voice database, he must input the analog voice signal of a group of words "the weather is fine today" through the input unit 10, the word segmentation module 1 2 This is then converted into a digital signal for processing. After the conversion of the digital signal format is completed, the word segmentation module 12 then divides the group of words "f at" and performs segmentation of the group and word according to the speech segmentation criterion set by the user. In this embodiment, the word segmentation module of the present invention analyzes the distribution relationship of the speech signal on the frequency spectrum. In other words, when the user inputs the voice from the mouth through the input unit 10, he must go through the operation of frequency conversion (Fourier transform) to obtain the data of the voice signal on the frequency spectrum. The original data includes at least the frequency. , Energy and time relationship, at a time point near a certain time point t (... t-2, t-1, t + 1, t + 2 · · ·) to get the energy data at each frequency, by calculating The average number and correlation coefficient are used to obtain the differences between them. In addition, in the two-dimensional data of "frequency" and "time", the edge detection principle of the two-dimensional image is used to obtain the boundary between two dissimilar speech segments, and then a variable threshold is used. The data and environment vary, so as to identify that the energy change in frequency at one time point and another time point has a significant performance that exceeds the threshold, and is used as the basis for segmenting words. In addition, a similar mother voice module is obtained between the division lines and the division lines. In other words, after inputting a certain set of word data, after the calculation and processing of the aforementioned word segmentation technology, at least one mother speech module can be obtained. According to the foregoing description, in this embodiment, the set of words input by the user may be divided into three parts such as "f", "a", and "t". Yu Benshi

]7458微星.的(1 第11頁 1242729 五、發明說明（8) 施例中，設該「f」、「a」以及「t」等三個部分分別為母語音模組「A」、「B」以及「C」。亦即，由母語音模組所組成之模組「ABC」即代表「f a t」。該儲存模組1 4 ’其係用以將該語詞分割模組1 2所分割出之該至少一母語音模組，以及對應該輸入訊號之母語音模組排列順序儲存至該資料庫1 4中。承前所述，於本實施例中，使用者透過該輸入單元1 0所輸入之該組語詞得被分割為「f」、「a」以及「t」等三個部分，故該儲存模組 14隨即將該「A」、「B」以及「C」等三個母語音模組，以及模組「ABC」儲存於該資料庫1 4中。此外，於該資料庫1 4建立之過程中，使用者復得透過該輸入單元1 0輸入音與音間的前後順序關係（s e q u e n t i a 1 cue)較長（f-a之間拉長音）的「fat」以及前後順序關係較短（f-a-1之間均為促音）的「fat」。其中，假設對應該f-a之間拉長音之「fat」的模組為「DC」；而對應該 f-a-1之間前後順序較短之「fat」的模組為「E」。則使用者德將該「ABC」、「DC」以及「E」模組所對應之語詞組均視為「fat」。該語音辨識模組1 8，其係用以於使用者透過該輸入單元1 0輸入語音訊號時，依據使用者預設之基準將該語音訊號分割成至少一待辨識母語音模組，並搜尋該資料庫1 4中是否有允符該待辨識母語音模組排列順序資料，若有，則擷取出該排列順序資料；若否，則列出該允符該母語音模組排列順序之可能組合。承前所述，於本實施例中，該語] 7458 微星. (1 Page 11 1242729 V. Description of the invention (8) In the embodiment, let the three parts "f", "a" and "t" be the mother voice modules "A", "A", " B "and" C ". That is, the module" ABC "composed of the mother voice module stands for" fat ". The storage module 1 4 'is used to divide the word into modules 12 The at least one mother voice module and the arrangement order of the mother voice module corresponding to the input signal are stored in the database 14. According to the foregoing description, in this embodiment, the user uses the input unit 10 The input group of words may be divided into three parts, such as "f", "a", and "t". Therefore, the storage module 14 immediately follows the three mother voices such as "A", "B", and "C" The module and the module "ABC" are stored in the database 14. In addition, during the establishment of the database 14, the user recovers the sequence relationship between the input sound and the sound through the input unit 10 (Sequentia 1 cue) longer "fat" (fat between fa) and short sequence relationship (fa-1 between fa-1) "Fat". Among them, it is assumed that the module corresponding to the "fat" of the prolonged sound between fa is "DC", and the module corresponding to the "fat" that has a shorter sequence between fa-1 is "fat". E ". The user will regard the phrase corresponding to the" ABC "," DC "and" E "modules as" fat ". The speech recognition module 18 is used by the user to pass When the input unit 10 inputs a voice signal, the voice signal is divided into at least one mother voice module to be identified according to a preset reference by the user, and a search is performed in the database 14 for whether the mother voice module to be identified is allowed. Group arrangement order data, if available, retrieve the arrangement order data; if not, list possible combinations that allow the arrangement order of the mother voice module. As mentioned earlier, in this embodiment, the phrase

17458微星.ptd 第12頁 1242729 五、發明說明（9) 音辨識模組1 8之語音訊號分割方式與前述之該語詞分割模組1 2相同，透過前述之分割技術，得將使用者透過該輸入單元1 0所輸入待辨識的語音訊號，分割成至少一待辨識母語音模組。此時，若使用者輸入一組語詞「f at」，則該語音辨識模組18將會分割為「f」、「a」以及「t」等三個待辨識母語音模組，亦即三個待辨識母語音模組模組「A」、「B」以及「C」所組成之待辨識模組「ABC」。之後，再透過動態時間校正之技術，搜尋該資料庫1 4中是否有儲存允符該待辨識模組「ABC」之語詞資料，若有則辨識出使用者透過該輸入單元1 0所輸入之語詞係為「f a t」；若無相允符之母語音模組排列順序，則將與該些母語音模組相符之可能組合自該資料庫1 4檢索出來，俾供使用者進一步的確認其所輸入之語詞資料為何。據此，使用者可以依據所列出之可能進行排列順序資料之建立。需特別說明者，若使用者透過該輸入單元1 0所輸入之「fat」係為f-a之間拉長音之「fat」或f-a-1之間前後順序較短之「f a t」。則該語音辨識模組1 8所辨釋出的模組將會分別是「DC」或「E」。承前所述，由於使用者於該資料庫1 0建立之過程中，已將前述拉長音或短音之「fat」模組「DC」或「E」所對應之語詞組均設定為「fat」。故縱使使用者透過該輸入單元1 0所輸入的並非標準之「f a t」語音資料則該語音辨識模組1 8仍得辨識出該語詞組「fat」。17458 微星 .ptd Page 12 1242729 V. Description of the invention (9) The voice signal segmentation method of the voice recognition module 18 is the same as that of the word segmentation module 12 described above. Through the aforementioned segmentation technology, the user must pass The voice signal to be recognized input by the input unit 10 is divided into at least one mother voice module to be recognized. At this time, if the user inputs a set of words "f at", the speech recognition module 18 will be divided into three to-be-recognized mother speech modules such as "f", "a", and "t", that is, three A to-be-recognized mother voice module "A", "B" and "C" are composed of the to-be-recognized module "ABC". Then, by using the technology of dynamic time correction, it is searched in the database 14 whether there is stored word data that allows the module "ABC" to be identified, and if so, it is identified that the user inputs through the input unit 10 The word system is "fat"; if there is no matching mother voice module arrangement order, the possible combinations that match the mother voice modules are retrieved from the database 14 for further confirmation by the user What words are entered? Based on this, the user can create the sequence data according to the listed possibilities. It should be noted that if the “fat” inputted by the user through the input unit 10 is “fat” with a long tone between f-a or “f a t” with a shorter sequence between f-a-1. Then the modules identified by the speech recognition module 18 will be "DC" or "E", respectively. According to the previous description, during the establishment of the database 10, the user has set the corresponding phrase of the "fat" module "DC" or "E" of the aforementioned long or short note to "fat" . Therefore, even if the user inputs non-standard "f at" voice data through the input unit 10, the voice recognition module 18 still has to recognize the phrase "fat".

17458微星.ptd 第13頁 1242729 五、發明說明（10) 另一方面，若使用者建立了對應語詞組「fact」之另一模組「ABFC」。則當使用者透過該輸入單元1 0輸入「fact」，然因使用者之發音不標準而未將該「c」音的母語音模組確實辨識出時，該語音辨識模組1 8復得藉由如辨識機率南低等一加權值（w e i g h t e d v a 1 u e)機制，以判定該不標準的語音所對應之模組為「ABC」或「ABFC」，若「ABC」之辨識機率較高則該語音辨識模組1 8則會將使用者所輸入之語音辨識成對應「ABC」模組之語詞組「fat」。請參閱第2圖，其中顯示本發明之語音資料庫建立與辨識方法時之流程步驟：於步驟S2 01中，令該語詞分割模組12將使用者透過該輸入單元1 0所輸入之語音訊號，依據使用者預設之基準將該語音訊號分割成至少一母語音模組。承前所述，於本實施例中’當使用者於建立語音資料庫時，得透過該輸入單元1 〇輸入一組語詞「f at」的類比語音訊號時，該語詞分割模組1 2隨即將該轉換成數位訊號加以處理，並將其分割成「f」、「a」以及「t」等三個部分。當分割完成後，再將δ玄些不同的母語音模組將儲存於該資料庫1 4中。接著進行步驟S 2 0 2。八方、步驟S 2 0 2中，令該儲存模組丨4將該語詞分割模組1 2 所二出之該至少一母語音模組，以及對應該輸入訊號之母=:板組排列順序儲存至該資料庫1 4中。承前所述，於 # K ^例中’該儲存模組1 4復得將使用者透過該輸入單元17458 微星 .ptd Page 13 1242729 V. Description of the invention (10) On the other hand, if the user creates another module "ABFC" for the corresponding phrase "fact". Then, when the user inputs "fact" through the input unit 10, but the mother voice module of the "c" sound is not recognized because the user's pronunciation is not standard, the voice recognition module 18 will have With a weightedva 1 ue mechanism such as low recognition probability, the module corresponding to the non-standard speech is determined to be "ABC" or "ABFC". If the recognition probability of "ABC" is high, then the The speech recognition module 18 will recognize the speech input by the user into the phrase "fat" corresponding to the "ABC" module. Please refer to FIG. 2, which shows the steps of the method for establishing and recognizing the voice database of the present invention: In step S2 01, the word segmentation module 12 is configured to direct the voice signal input by the user through the input unit 10 , The voice signal is divided into at least one mother voice module according to a preset reference set by the user. According to the foregoing description, in this embodiment, when the user is creating a voice database, the analog voice signal of a group of words "f at" can be input through the input unit 10, and the word segmentation module 12 will then This is converted into a digital signal for processing, and it is divided into three parts: "f", "a", and "t". After the segmentation is completed, the different mother voice modules of δ are stored in the database 14. Then, step S202 is performed. In all directions, in step S202, the storage module 丨 4 is configured to store the at least one mother voice module which is separated from the word segmentation module 1 2 and the mother corresponding to the input signal =: the board group is stored Go to the database 14. According to the previous description, in # K ^ 例 ’the storage module 1 4 has to pass the user through the input unit

第14頁 1242729 五、發明說明（11) 1 0所輸入之三個母語音模組「f」、「a」以及「t」所排列出「f a t」之順序加以儲存至該資料庫1 4中，接著進行步驟S 2 0 3。於步驟S 2 0 3中，令該語音辨識模組1 8於使用者透過該輸入單元1 0輸入語音訊號時，依據使用者預設之基準將該語音訊號分割成至少一待辨識母語音模組。承前所述，於本實施例中，該語音辨識模組1 8之語音訊號分割方式與前述之該語詞分割模組1 2相同，透過前述之分割技術，得將使用者透過該輸入單元1 0所輸入待辨識的語音訊號，分割成至少一待辨識母語音模組。承前所述，若使用者輸入一組語詞「f a t」，則該語音辨識模組1 8將會分割為「f」、「a」以及「t」等三個待辨識母語音模組，接著進行步驟 S2 0 4 ° 於步驟S 2 0 4中，令該語音辨識模組1 8搜尋該資料庫1 4 中是否有允符該待辨識母語音模組排列順序資料。承前所述，於本實施例中，該語音辨識模組1 8透過動態時間校正之技術，搜尋該資料庫1 4中是否有儲存允符該「f a t」排列順序之語詞資料，若是，則進至步驟S 2 Ο 5 ;若否，則進至步驟S 2 0 6。於步驟S 2 0 5中，令該語音辨識模組1 8辨識出使用者透過該輸入單元1 0所輸入之語詞係為「f a t」。於步驟S 2 Ο 6中，令該語音辨識模組1 8將與該些母語音模組相符之可能組合自該資料庫1 4檢索出來，俾供使用者進一步的確認其所輸入之語詞資料為何。Page 14 1242729 V. Description of the invention (11) The order of the three mother voice modules "f", "a" and "t" entered in "fat" is stored in the database 14 Then, proceed to step S203. In step S203, the voice recognition module 18 is caused to divide the voice signal into at least one mother voice mode to be recognized when the user inputs a voice signal through the input unit 10 according to a preset reference of the user. group. According to the foregoing, in this embodiment, the voice signal segmentation method of the speech recognition module 18 is the same as that of the word segmentation module 12 described above. Through the aforementioned segmentation technology, the user can pass the input unit 1 0 The input voice signal to be recognized is divided into at least one mother voice module to be recognized. According to the previous description, if the user enters a set of words "fat", the speech recognition module 18 will be divided into three to-be-recognized mother speech modules such as "f", "a", and "t", and then proceed Step S204: In step S204, the speech recognition module 18 is caused to search whether there is data in the database 14 that permits the arrangement order data of the mother speech module to be recognized. According to the foregoing description, in this embodiment, the speech recognition module 18 uses dynamic time correction technology to search whether there is word data in the database 14 that allows the "fat" arrangement order, and if so, enter Go to step S 2 0 5; if not, go to step S 2 0 6. In step S205, the speech recognition module 18 is made to recognize that the word entered by the user through the input unit 10 is "f a t". In step S206, the speech recognition module 18 is made to retrieve possible combinations that match the mother speech modules from the database 14 for further confirmation by the user of the entered word data. Why.

17458微星.ptd 第15頁 1242729 五、發明說明（12) 綜上所述，本發明之語音資料庫建立與辨識方法以及系統，除得以增加資料庫之樣本數量且不致於無限擴張資料庫的語音樣本數量之前提下，增加語音訓練與辨識成功之效率，復得節省建立個人語音特徵之時間。又，本發明之語音資料庫建立與辨識方法以及系統復得結合文字轉語音（Text To Speech; TTS)而成為互動式對話系統。上述實施例僅為例示性說明本發明之原理及其功效，而非用於限制本發明。任何熟習此項技藝之人士均可在不違背本發明之精神及範疇下，對上述實施例進行修飾與變化。因此，本發明之權利保護範圍，應如後述之申請專利範圍所列。17458 微星 .ptd Page 15 1242729 V. Description of the invention (12) In summary, the method and system for establishing and identifying a voice database of the present invention, in addition to increasing the number of samples in the database and not extending the voice of the database infinitely The number of samples was raised earlier to increase the efficiency of speech training and recognition success, so as to save time in establishing personal speech features. In addition, the method and system for establishing and recognizing a speech database of the present invention combine text to speech (TTS) to become an interactive dialogue system. The above-mentioned embodiments are merely illustrative for explaining the principle of the present invention and its effects, and are not intended to limit the present invention. Anyone skilled in the art can modify and change the above embodiments without departing from the spirit and scope of the present invention. Therefore, the scope of protection of the rights of the present invention should be listed in the scope of patent application mentioned later.

17458微星.口七(] 第16頁 1242729 圖式簡單說明【圖式簡單說明】第1圖係為一方塊示意圖，用以顯示本發明之語音資料庫建立與辨識系統之系統架構；以及第2圖係為一流程圖，用以顯示本發明之語音資料庫建立與辨識方法執行時之流程步驟。 1 個人電腦 10 輸入單元 12 語詞分割模組 14 資料庫 1 6 儲存模組 18 語音辨識模組17458 MSI. Mouth Seven () Page 16 1242729 Simple illustration of the diagram [Simplified illustration of the diagram] FIG. 1 is a block diagram showing the system architecture of the speech database creation and identification system of the present invention; and 2 The figure is a flowchart showing the steps of the method for establishing and identifying the speech database of the present invention. 1 Personal computer 10 Input unit 12 Word segmentation module 14 Database 1 6 Storage module 18 Speech recognition module

]7458微星.pt-d 第17頁] 7458 MSI.pt-d Page 17

Claims

1242729 VI. Scope of patent application 1. A method for establishing and recognizing a speech database, which can be applied to a data processing device to provide the function of speech recognition of the data processing device, including: (1) ordering a word segmentation module The group divides a voice signal input by a user through an input unit, divides the voice signal into at least one mother voice module according to a preset reference of the user, and stores the mother voice module in a data through a storage module. In the database; (2) order a storage module to store the mother voice module arrangement order corresponding to the voice signal input by the user in the database; (3) order a voice recognition module in the user through the input When the unit inputs a voice signal, the voice signal is divided into at least one to-be-recognized parent voice module according to a preset reference of the user; (4) The voice recognition module module is caused to search whether there is a permit in the database Recognize the arrangement order data of the mother speech module. If there is, go to step (5); if not, go to step (6); (5) Make the speech recognition module extract the data that allows the arrangement order. Data; and (6) order the speech recognition module to list possible combinations that allow the arrangement order of the mother speech module. 2. If the method of item 1 of the patent application scope is included, before the word segmentation module divides the voice signal, make the word segmentation module convert the received analog voice signal into a digital signal format. 3. The method of item 1 in the scope of patent application, wherein the word segmentation module analyzes the distribution relationship of the speech signal on the frequency spectrum.

17458 MSI. 卩士 (1 Page 18 1242729 VI. Patent Application Scope 4. For the method of patent application No. 3, where the distribution relationship of the frequency spectrum includes two-dimensional data of "frequency" and "time", using two-dimensional data The edge detection principle of the image is used to obtain the boundary between two dissimilar speech segments. 5. For the method in the scope of patent application No. 4, wherein the boundary of the speech segment is a variable threshold, the threshold will be The voice data and the environment vary, so as to identify that the energy change in frequency between one time point and another time point has a significant performance that exceeds the threshold value, and is used as the basis for segmenting words. 6. If applying for a patent The method of the first item in the scope, wherein the word segmentation module is based on one of the speed, energy, and frequency of the voice data. 7. If the method of the first item in the scope of the patent application is applied, the method includes the different The arrangement order of the mother speech module is set to correspond to the same language phrase. 8. For the method of the first scope of the patent application, wherein the speech recognition module is corrected by dynamic time Dynamic T i in e W arping; DTW) technology to compare the mother voice module and the specific mother voice module arrangement order in the database, in order to obtain the closest to the user's input voice content Result. 9. The method of item 1 of the patent application scope, wherein the voice recognition module module searches for a predetermined weighting value in the database as to whether or not the arrangement order of the mother voice module to be identified is allowed. The criterion for judging the data. 10. If the method of the first item of the scope of patent application, the data processing device may be a data processing system compatible with a personal computer and embedded

17458 MSI.ptd Page 19 1242729 6. Scope of Patent Application. 1 1. The method according to item 10 of the patent application scope, wherein the personal computer compatible data processing system may be a workstation, a personal computer, a notebook computer, an LCD computer, a tablet computer, a palmtop computer, a personal digital assistant, and a mobile Call one of them. 1 2. A speech database establishment and recognition system, which can be applied to a data processing device to provide the speech recognition function of the data processing device, includes: a word segmentation module, which is used to use The voice signal input by an input unit is divided into at least one mother voice module according to a preset reference by the user, and the mother voice module is stored in a database; a storage module, It is used to store the at least one mother speech module divided by the word segmentation module and the arrangement order of the mother speech modules corresponding to the input signal to the database; and a speech recognition module, which is When the user inputs a voice signal through the input unit, the voice signal is divided into at least one to-be-recognized mother voice module according to a preset reference of the user, and a search is performed in the database to see if the to-be-recognized mother is allowed. Voice module arrangement order data, if available, retrieve the arrangement order data; if not, list possible combinations that allow the arrangement order of the mother voice module. 13. The system of item 12 in the scope of patent application, wherein the word segmentation module includes an analog digital conversion unit for converting an analog voice signal input by a user into a digital signal.

17458 MSI.ptd Page 20 1242729 VI. Scope of Patent Application 1 4. For the system of No. 12 scope of patent application, the word segmentation module analyzes the distribution relationship of speech signals on the frequency spectrum. 15. The system according to item 14 of the scope of patent application, wherein the distribution relationship of the frequency spectrum includes two-dimensional data of "frequency" and "time", and the edge detection principle of the two-dimensional image is used to obtain two dissimilar speech segments. Demarcation. 16. If the system of item 15 of the scope of patent application, the delimitation of the voice segment is a variable threshold value, and the threshold value may be changed due to different voice data and environment, so as to identify a certain The energy change in frequency between one time point and another time point has a significant performance that exceeds the threshold value, and 俾 is used as the basis for segmenting words. 17. The system according to item 12 of the scope of patent application, wherein the word segmentation module is based on one of the speed, energy and frequency of the speech data. 18. The system of item 12 in the scope of patent application, wherein the voice recognition module is based on the technology of Dynamic Time Correction (DTW) and the mother voice modes in the database. The group and the specific mother voice module arrangement order are compared to obtain the result closest to the user's input voice content. 19. If the system of item 12 in the scope of patent application, wherein the voice recognition module module searches for a predetermined weighted value in the database as to whether the order of the mother voice module to be recognized is allowed Judgment basis of information. 2 0. The system of item 12 in the scope of patent application, wherein the data processing device

17458 MSI.ptd Page 21 1242729 VI. Scope of Patent Application The device can be a personal computer compatible and embedded data processing system. 2 1. If the system of the scope of patent application No. 20, wherein the personal computer compatible data processing system can be a workstation, personal computer, notebook computer, LCD computer, tablet computer, palmtop computer, personal digital assistant and mobile Call one of them. 2 2. The system according to item 12 of the scope of patent application, wherein the database is a relational database.

17458 MSI 邛 1: (1 page 22