TWI242729B - Speech database establishment and recognition method and system thereof - Google Patents

Speech database establishment and recognition method and system thereof Download PDF

Info

Publication number
TWI242729B
TWI242729B TW93101136A TW93101136A TWI242729B TW I242729 B TWI242729 B TW I242729B TW 93101136 A TW93101136 A TW 93101136A TW 93101136 A TW93101136 A TW 93101136A TW I242729 B TWI242729 B TW I242729B
Authority
TW
Taiwan
Prior art keywords
module
voice
speech
patent application
database
Prior art date
Application number
TW93101136A
Other languages
Chinese (zh)
Other versions
TW200525384A (en
Inventor
Li-Lu Chen
Original Assignee
Micro Star Int Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Micro Star Int Co Ltd filed Critical Micro Star Int Co Ltd
Priority to TW93101136A priority Critical patent/TWI242729B/en
Publication of TW200525384A publication Critical patent/TW200525384A/en
Application granted granted Critical
Publication of TWI242729B publication Critical patent/TWI242729B/en

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a speech database establishment and recognition method and system thereof. The method includes enabling a word segmentation module to divide a speech signal inputted by a user through an input unit into at least a vowel speech module in accordance with a rule pre-defined by the user and to store the vowel speech module in a database through a storage module; enabling the storage module to store a vowel speech module arrangement sequence corresponding to the speech signal inputted by the user; enabling a speech recognition module to divide the speech signal into at least a vowel speech module to be recognized in accordance with the rule pre-defined by the user while the user inputs the speech signal through the input unit; enabling the speech recognition module to search if there is a match with arrangement sequence data of the vowel speech module in the database, retrieving the arrangement sequence data if yes, and listing a possible combination matched with the vowel speech module arrangement sequence if no. With the voice database establishment and recognition method and the word segmentation mechanism of the system, it is able to provide a concise speech recognition database structure and a speech recognition method and system varied according to the feature of user.

Description

1242729 五、發明說明(1) 【發明所屬之技術領域】 一種語音資料庫建立與辨識方法以及系統,更詳而言 之,係有關於一種透過語詞分割技術提升語音訓練與辨識 效率之方法與系統。 【先前技術】 隨著電子資訊產業發展的日新月異,各種功能強大且 價格低廉的消費性電子資訊產品紛紛問世,就以其中最為 普遍的電腦而言,由於各種軟體以及硬體在功能上不斷的 加強,相對的也讓電腦能夠處理的工作已不再像以往一般 只限於程式運作或是資料處理,而是扮演著一個影像音聲 傳播媒介的角色。要言之,電腦已經從公司或實驗室走向 家庭電器產品的領域中。 不單於電腦方面是如此,在另一方面,生活週遭的各 種電氣化產品也越來越強調電腦化。透過各種嵌入式系 統,如電視機、電冰箱或洗衣機等電氣化產品,已經漸漸 的具有小型電腦之功能。換言之,使用者透過簡單的人機 介面即可設定操作不同的功能選項。更進一步者,使用者 除單向的設定操作外,尚能與該電氣化產品進行溝通,甚 至與外界藉由電子郵件等方式聯絡。是則,以往單純的家 電產品也電腦化’而往育豕電的方向發展。 承前所述,不論是電腦家電化或是家電電腦化,使用 者都必須透過人機介面與機器溝通,以輸入單元為例,其 中最常用的莫過於鍵盤按鈕、滑鼠或其他類似的輸入單 元。雖然該些輸入單元可以提供使用者輸入設定操作時所1242729 V. Description of the invention (1) [Technical field to which the invention belongs] A method and system for establishing and identifying a speech database, more specifically, a method and system for improving speech training and identification efficiency through word segmentation technology . [Previous technology] With the rapid development of the electronic information industry, various powerful and low-priced consumer electronic information products have come out. As for the most common computers, various software and hardware have continued to strengthen their functions. In contrast, the work that the computer can handle is no longer limited to program operation or data processing as before, but plays a role of a media of video and audio transmission. In other words, computers have moved from companies or laboratories into the field of home appliances. This is not only the case with computers. On the other hand, the electrification products around life are increasingly emphasizing computerization. Through various embedded systems, such as televisions, refrigerators, or washing machines, they have gradually become small computers. In other words, the user can set and operate different function options through a simple human-machine interface. Furthermore, in addition to the one-way setting operation, the user can still communicate with the electrified product, and even communicate with the outside world via email. Yes, in the past, simple home electrical products were also computerized 'and developed in the direction of Yudai Electric. According to the previous description, whether it is a computerized home appliance or a computerized home appliance, the user must communicate with the machine through a human-machine interface. Taking the input unit as an example, the most commonly used is a keyboard button, mouse, or other similar input unit. . Although these input units can provide users with

17458 微星.ptd 第5頁 1242729 五、發明說明(2) 需的指令或資 的體積對於講 在;其次,使 傳統的輸入方 電腦家電化或 為解決此 擇的輸入方式 然可以大幅減 者只需如對人 行溝通,對於 是欲透過語音 語音資料庫以 中華明國 語音學習系統 測使用者所輸 料,但是其仍有 求輕薄短小的設 用者未必為一熟 式與電 腦溝通有 家電電腦化均為一障礙 不方便之處,例如輸入單元 計觀而言通常是困難點所 黯電腦人士,抑或是其透過 所困難。以上種種對於落實 一問題,以語音 ,只需一個如麥 少產品 溝通般 不黯電 當作輸 及一有 公告第 及其方 入的學 一用以辨認輸入的學 習例句比較的符合率 習例句的語音以訓練 訓練裝置。經 模型幾已涵蓋 時,能有效的 輸入信號。 前述的語 系統所習用之 過一組 所有本 依據該 的體積及 的以口語 腦操作的 入媒介, 效率的辨 3 0 8 6 6 6號 法」,其 習例句的 習例句的 之辨認裝 使用者的 學習例句 身的語音 語音模型 輸入代 克風般 所占用 方式說 使用者 首先必 識系統 專利揭 技術特 語音信 語音至置,以 語音模 之訓練 特性, 内之語 替傳統文字或圖像選 的聲音輸入單元,顯 的空間。再者,使用 出指令即得與機器進 而言亦頗為方便。但 須有一個資料豐富的 露一種 徵在於 號之特 計算其 及一藉 型並更 後,該 致使在 音特性 「智慧 經由機 徵參數 辨認結 由使用 新其中 使用者 正式上 辨認使 型國語 器先偵後,經 果與學 者如學 資料之 的語音 線使用 用者的 音學習與辨識系統及方法係為現今語音辨識 技術。然其卻存在著相當大的缺點,亦即使17458 MSI.ptd Page 5 1242729 V. Description of the Invention (2) The volume of required instructions or resources is important; secondly, the traditional input side computer appliances or the input method to solve this choice can be greatly reduced. Need to communicate with the people, for users who want to use the voice database to test the input of the Chinese Mingguo Voice Learning System, but it still requires light, short and small users may not be familiar with the computer with a home appliance computer Conversion is an inconvenience. For example, in terms of the input unit, it is usually difficult for the computer person to dim the computer, or it is difficult for them to pass through. For the implementation of the above problems, using voice, only one that is not as dark as the communication of Mai Shao ’s products is used as a loser, and there is a bulletin and its learning, which is used to identify the input. Voice to training training device. When the model is almost covered, it can effectively input the signal. The above-mentioned language system has used a set of all the media that are based on the volume and the oral brain operation, to identify the method 3 0 8 6 6 6 ", the identification and use of example sentences The user ’s learning example of the body ’s voice and voice model is input in a gram-like manner. The user must first recognize the system ’s patented technology, and the special feature of the voice message. With the training characteristics of the voice model, the internal language is used for traditional text or images. Selected sound input unit, display space. Furthermore, it is convenient to use the instruction to enter the machine. But there must be a wealth of information to reveal the characteristics of the number and the calculation of a borrowing type and later, which should result in the "characteristics of wisdom" identified by the mechanical parameters, and the use of the new among which the user officially recognizes the type of Mandarin. After investigating first, the sound learning and recognition systems and methods of the user using the voice line of the economic and academic materials are the current speech recognition technology. However, it has considerable shortcomings.

國圓Guoyuan

17458 微星.ptd 第6頁 1242729 五、發明說明(3) 用者必須先依 以建立使用者 時養成用清晰 特徵建立及識 不但欠缺人性 反覆多次的嘗 若有變更則必 率將下降。 又,習知 Model; HMM) 型之數量及内 内容後,再輸 立。而另一種 DTW)來進行語 的完整語音資 換言之,使用 識的語音數量 則勢必要建立 前述隱藏式馬 綜上所述 建立與辨識方 【發明内容】 為解決上 提供一種語音 據接近預定之標準速度與音量朗讀例句,藉 的語音特徵俾降低系統辨識錯誤之機會,同 穩定的朗讀方式輸入語音的習慣。此種語音 別的方式要求使用者遷就機器的識別習慣, 化’對於反應較不敏捷的使用者而言則必須 ϋ式才能求得較佳的辨識效果。此外,使用者 須重新調適(ad just)使用者特徵,否則辨識 利用隱藏式馬可夫模型(Hidden Markov 作為語音識別的判斷基準,其缺點在於其模 容係預先設定的,當使用者設定模型數量及 八符合該些模型之語音資料以完成模型之建 動態時間校正法(D y n a m i c T i m e W a r p i n g ; 音的辨識之技術,則係以使用者預先所輸入 料作為比對基準,其並無所謂模組之概念。 者輪入的資料數及其内容即決定其所能夠辨 及其内容,一旦要求達到一定的辨識程度, 相當龐大的資料庫,同樣的情形亦會發生在 可夫模型語音識別技術中。 ’如何能夠提供一種更有效率的語音資料庫 法以及系統,遂成為目前亟待解決之課題。 述習知技術之缺點’本發明之主要目的在於 資料庫建立與辨識方法以及系統,透過語詞17458 MSI.ptd Page 6 1242729 V. Description of the invention (3) The user must first establish a user with clear features to establish and recognize, not only lacking in humanity, repeated trial and error, if there are changes, the rate will decline. Also, learn about the number and content of Model (HMM) models before you enter. And another DTW) is used to complete the speech of the language. In other words, the amount of recognized speech is bound to establish the above-mentioned hidden horse comprehensive establishment and identification party. [Summary of the invention] To solve the problem of providing a speech data close to predetermined standards Speed and volume read example sentences, borrowed features of the voice 俾 reduce the chance of the system to identify errors, and the same habit of entering speech with stable reading. This other way of speech requires the user to adapt to the recognition habits of the machine. For users with less agility, it is necessary to use a formula to obtain better recognition results. In addition, the user must re-adjust the user's characteristics, otherwise Hidden Markov is used as the judgment criterion for speech recognition. The disadvantage is that the module is preset. When the user sets the number of models and Eight. The speech data that conforms to these models to complete the model. Dynamic Time Correction (Dynamic T ime W arping; the technology of sound identification, based on the user's input in advance as a comparison benchmark, there is no so-called module The number of data and content of the person's rotation determines its ability to distinguish and its content. Once a certain degree of recognition is required, a very large database, the same situation will also occur in the Kuff model speech recognition technology "How to provide a more efficient speech database method and system has become a problem to be solved urgently. Describe the shortcomings of the conventional technology" The main purpose of the present invention is to establish a database and identification method and system, through words

17458 微星.ptd 第7 ! 1242729 五、發明說明(4) 分割機制,得以增加資料庫之樣本數量,俾增加語音訓練 與辨識成功之機率。 本發明之另一目的在於提供一種語音資料庫建立與辨 識方法以及系統,透過語詞分割機制,使用者無須自始重 複學習例句之發音速度、頻率及/或語調,故得於使用語 音辨識前節省建立個人語音特徵之時間。 本發明之再一目的在於提供一種語音資料庫建立與辨 識方法以及系統,透過語詞組合機制,得將一定數量範圍 之語音資料加以排列組合成複雜之語詞組合,故得節省大 量的資料庫資料量。 本發明之再一目的在於提供一種語音資料庫建立與辨 識方法以及系統,透過語詞分割機制,縱然使用者之發音 未符標準,仍能獲得相當接近之辨識結果。 為達成以上所述及其他目的,本發明之語音資料庫建 立與辨識系統包括有:一語詞分割模組,其係用以將使用 者透過一輸入單元所輸入之語音訊號,依據使用者預設之 基準將該語音訊號分割成至少一母語音模組,並將該母語 音模組儲存於一資料庫中;一儲存模組,其係用以將該語 詞分割模組所分割出之該至少一母語音模組,以及對應該 輸入訊號之母語音模組排列順序儲存至該資料庫中;一語 音辨識模組,其係用以於使用者透過該輸入單元輸入語音 訊號時,依據使用者預設之基準將該語音訊號分割成至少 一待辨識母語音模組,並搜尋該資料庫中是否有允符該待 辨識母語音模組排列順序資料,若有,則擷取出該排列順17458 MSI.ptd 7th! 1242729 V. Description of the invention (4) The segmentation mechanism can increase the number of samples in the database and increase the probability of successful speech training and recognition. Another object of the present invention is to provide a method and system for establishing and recognizing a speech database. Through the word segmentation mechanism, users do not need to repeatedly learn the pronunciation speed, frequency, and / or intonation of example sentences from the beginning, so they can save before using speech recognition. Time to establish personal voice characteristics. Another object of the present invention is to provide a method and system for establishing and identifying a voice database. Through a word combination mechanism, a certain amount of voice data can be arranged and combined into a complex word combination, so a large amount of database data can be saved. . Yet another object of the present invention is to provide a method and system for establishing and recognizing a voice database. Through the word segmentation mechanism, even if the user's pronunciation does not meet the standard, a fairly close recognition result can still be obtained. In order to achieve the above and other objectives, the speech database establishment and identification system of the present invention includes: a word segmentation module, which is used to convert the voice signal input by the user through an input unit according to the user's preset Based on the reference, the voice signal is divided into at least one mother voice module, and the mother voice module is stored in a database; a storage module is used to divide the at least the word division module into the at least one A mother voice module and the arrangement order of the mother voice module corresponding to the input signal are stored in the database; a speech recognition module is used to input a voice signal through the input unit according to the user The preset standard divides the voice signal into at least one mother voice module to be identified, and searches the database for data that allows the arrangement order of the mother voice module to be identified, and if so, retrieves the arrangement order.

17458 微星.ptd 第8頁 1242729 五、發明說明(5) 序資料;若否,則列出該允符該母語音模組排列順序之可 能組合。 透過該語音資料庫建立與辨識系統,執行語音訓練與 辨識的方法係:首先,令該語詞分割模組將使用者透過一 輸入單元所輸入之語音訊號,依據使用者預設之基準將該 語音訊號分割成至少一母語音模組,並透過一儲存模組將 該母語音模組儲存於一資料庫中;其次,令該儲存模組將 對應該使用者所輸入之語音訊號的母語音模組排列順序儲 存於該資料庫中;接著,令該語音辨識模組於使用者透過 該輸入單元輸入語音訊號時,依據使用者預設之基準將該 語音訊號分割成至少一待辨識母語音模組;再者,令該語 音辨識模組模組搜尋該資料庫中是否有允符該待辨識母語 音模組排列順序資料,若有,則擷取出該排列順序資料; 若否,則列出該允符該母語音模組排列順序之可能組合。 相較於習知之語音訓練與辨識技術,本發明之語音資 料庫建立與辨識方法以及系統,除得以增加資料庫之樣本 數量,俾增加語音訓練與辨識成功之機率外,復得節省建 立個人語音特徵之時間。此外,縱使使用者之發音未符標 準,仍能獲得相當接近之辨識結果以增加辨識成功之機 率。 【實施方式】 以下係藉由特定的具體實施例說明本發明之實施方 式,熟悉此技藝之人士可由本說明書所揭示之内容輕易地 瞭解本發明之其他優點與功效。本發明亦可藉由其他不同17458 MSI.ptd Page 8 1242729 V. Description of the invention (5) Sequence information; if not, list the possible combinations that allow the arrangement order of the parent voice module. The method for establishing and recognizing a system through the voice database and performing voice training and recognition is as follows: First, the word segmentation module is configured to convert the voice signal input by the user through an input unit and the voice according to a preset reference of the user. The signal is divided into at least one mother voice module, and the mother voice module is stored in a database through a storage module; secondly, the storage module will correspond to the mother voice module of the voice signal input by the user The arrangement order of the groups is stored in the database; then, when the user inputs a voice signal through the input unit, the voice recognition module divides the voice signal into at least one mother voice mode to be recognized according to a preset reference of the user. Furthermore, the voice recognition module module is caused to search the database for the arrangement order data of the mother voice module to be identified, and if so, retrieve the arrangement order data; if not, list them This permits a possible combination of the arrangement order of the mother voice module. Compared with the conventional voice training and recognition technology, the method and system for establishing and recognizing a voice database of the present invention can save the number of samples in the database and increase the probability of successful voice training and recognition. Feature time. In addition, even if the user's pronunciation is not up to standard, a fairly close recognition result can still be obtained to increase the chance of successful recognition. [Embodiment] The following describes the embodiment of the present invention through specific embodiments. Those skilled in the art can easily understand other advantages and effects of the present invention from the content disclosed in this specification. The present invention can also be modified by other

17458微星.ptd 第9頁 1242729 ---—__ 五、發明說明(6)^ ~~-—____ 白勺 #骨杳 # ^ 可基‘:二例加以施行或應用,本說明書中的各項細節亦 、+同觀點與應用,在不悖離本發明之籍妯π % ^ 種修飾與變更。 S 5 <積神下進行各 建立第1圖’於本實施例中’本發明之語音資料庫 者透過統係應用於一個人電腦1中,用以提供使用 ,η 本餐明之語音資料庫建立與辨識方法以及夺统盘, 书細1進行諸如操作及/或設定等溝通。需特別說明 r〗’洁係穴本發明之語音資料庫建立與辨識系統以及該個人電 ^貫/祭之系統軟硬體架構更為複雜,為突顯本發明之技 7特彳政所在’故僅顯示論述與本發明之技術特徵相關之部 t °又’本發明之語音資料庫建立與辨識方法以及系統復 传應用於工作站 '筆記型電腦、液晶電腦、平板電腦、掌 上型電腦、個人數位助理以及行動電話等其中之一者 。本發明之語音資料庫建立與辨識系統至少包括:一輸 入單元1 0、一語詞分割模組1 2、一資料庫1 4、一儲存模組 1 6以及一語音辨識模組1 8。 該輸入單元丨〇,其係用以提供使用者輸入語音訊號至 該語音資料庫建立與辨識系統中之具有集音功能之單元, 於本戶、加例中’其係為一麥克風(in i c r 〇 p h ο n e)。 该語詞分割模組1 2,其係用以將使用者透過該輸入單 凡1 0所輸入之語音訊號,依據使用者預設之基準將該語音 訊號分割成至少一母語音模組。於本實施例中,該語詞分 釗杈組1 2復包括一類比數位換單元(未圖示),用以將使 用者所輸入之類比語音訊號轉換成數位訊號,因此,當使17458 微星 .ptd Page 9 1242729 -----__ V. Description of the invention (6) ^ ~~ -—____ 白 hotel # 骨 杳 # ^ 可 基 ': Two examples are implemented or applied, each item in this specification The details are the same, and the same viewpoints and applications do not deviate from the modifications of the present invention. S 5 < Each one is created under the accumulation of the first figure 'in this embodiment' The voice database of the present invention is applied to a personal computer 1 through the system to provide use, η The voice database of this meal is established Communicate with the identification method and the control panel, Book 1 such as operation and / or setting. It should be specially explained that "'Jie Xi Acupoint', the voice database establishment and identification system of the present invention, and the software and hardware architecture of the personal electricity / transmission / sacrifice system are more complicated, in order to highlight the technology of the present invention. Only the parts related to the technical features of the present invention will be shown t °, and the method of establishing and recognizing the voice database of the present invention and the system retransmission are applied to workstations' notebook computers, LCD computers, tablet computers, palmtop computers, personal digital Assistants and mobile phones. The speech database establishment and recognition system of the present invention includes at least: an input unit 10, a word segmentation module 12, a database 14, a storage module 16, and a speech recognition module 18. The input unit 丨 〇 is used to provide a user to input a voice signal into the voice database establishment and identification system with a sound collection function unit, in this example, it is a microphone (in icr 〇ph ο ne). The word segmentation module 12 is used to divide the voice signal input by the user through the input list 10, and divide the voice signal into at least one mother voice module according to the preset reference of the user. In this embodiment, the word division group 1 2 includes an analog digital conversion unit (not shown) for converting an analog voice signal input by a user into a digital signal. Therefore, when using

17458微星.ptd17458 MSI.ptd

第〗〇頁 1242729 五、發明說明(7) 用者於建立語音資料庫時,得透過該輸入單元1 0輸入一組 語詞「今天天氣很好」的類比語音訊號時,該語詞分割模 組1 2隨即將該轉換成數位訊號加以處理。於完成數位訊號 格式之轉換後,該語詞分割模組1 2隨即將該組語詞 「f at」,依據使用者所設定語音分割基準,進行該組與 詞之分割。 於本實施例中,本發明之語詞分割模組係分析語音訊 號在頻譜上的分布關係。要言之,當使用者透過該輸入單 元1 0輸入由口中所發出之語音時,得經過時域轉頻之運算 (傅立葉轉換)以得到語音訊號在頻譜上的資料,該原始 資料至少包括頻率、能量以及時間的關係,在某一時間點 t附近之時間點(…t - 2、t - 1、t + 1、t + 2···)得到每個頻 率上的能量資料,藉由計算其平均數和相關係數,以取得 相互間之差異性。此外,在「頻率」與「時間」的二維數 據裡,利用二維影像的邊緣偵測原理,以得到兩不相似語 音片段之分界,再使用可變動之門檻值,此門檻值會因語 音資料和環境的不同而有所變更,藉以鑑別出某一時間點 與另一時間點在頻率上的能量變化有顯著且超出門檻值的 表現,俾作為分割語詞之依據。又,在分割線與分割線之 間即得為相似的母語音模組。換言之,在某一組語詞資料 輸入後,經過前述之語詞分割技術之計算與處理,即可得 到至少一母語音模組。 承前所述,於本實施例中,使用者所輸入之該組語詞 得被分割為「f」、「a」以及「t」等三個部分。於本實Page 〖〇1242729 V. Description of the invention (7) When the user sets up a voice database, he must input the analog voice signal of a group of words "the weather is fine today" through the input unit 10, the word segmentation module 1 2 This is then converted into a digital signal for processing. After the conversion of the digital signal format is completed, the word segmentation module 12 then divides the group of words "f at" and performs segmentation of the group and word according to the speech segmentation criterion set by the user. In this embodiment, the word segmentation module of the present invention analyzes the distribution relationship of the speech signal on the frequency spectrum. In other words, when the user inputs the voice from the mouth through the input unit 10, he must go through the operation of frequency conversion (Fourier transform) to obtain the data of the voice signal on the frequency spectrum. The original data includes at least the frequency. , Energy and time relationship, at a time point near a certain time point t (... t-2, t-1, t + 1, t + 2 · · ·) to get the energy data at each frequency, by calculating The average number and correlation coefficient are used to obtain the differences between them. In addition, in the two-dimensional data of "frequency" and "time", the edge detection principle of the two-dimensional image is used to obtain the boundary between two dissimilar speech segments, and then a variable threshold is used. The data and environment vary, so as to identify that the energy change in frequency at one time point and another time point has a significant performance that exceeds the threshold, and is used as the basis for segmenting words. In addition, a similar mother voice module is obtained between the division lines and the division lines. In other words, after inputting a certain set of word data, after the calculation and processing of the aforementioned word segmentation technology, at least one mother speech module can be obtained. According to the foregoing description, in this embodiment, the set of words input by the user may be divided into three parts such as "f", "a", and "t". Yu Benshi

]7458微星.的(1 第11頁 1242729 五、發明說明(8) 施例中,設該「f」、「a」以及「t」等三個部分分別為 母語音模組「A」、「B」以及「C」。亦即,由母語音模 組所組成之模組「ABC」即代表「f a t」。 該儲存模組1 4 ’其係用以將該語詞分割模組1 2所分割 出之該至少一母語音模組,以及對應該輸入訊號之母語音 模組排列順序儲存至該資料庫1 4中。承前所述,於本實施 例中,使用者透過該輸入單元1 0所輸入之該組語詞得被分 割為「f」、「a」以及「t」等三個部分,故該儲存模組 14隨即將該「A」、「B」以及「C」等三個母語音模組, 以及模組「ABC」儲存於該資料庫1 4中。 此外,於該資料庫1 4建立之過程中,使用者復得透過 該輸入單元1 0輸入音與音間的前後順序關係(s e q u e n t i a 1 cue)較長(f-a之間拉長音)的「fat」以及前後順序關 係較短(f-a-1之間均為促音)的「fat」。其中,假設對 應該f-a之間拉長音之「fat」的模組為「DC」;而對應該 f-a-1之間前後順序較短之「fat」的模組為「E」。則使 用者德將該「ABC」、「DC」以及「E」模組所對應之語詞 組均視為「fat」。 該語音辨識模組1 8,其係用以於使用者透過該輸入單 元1 0輸入語音訊號時,依據使用者預設之基準將該語音訊 號分割成至少一待辨識母語音模組,並搜尋該資料庫1 4中 是否有允符該待辨識母語音模組排列順序資料,若有,則 擷取出該排列順序資料;若否,則列出該允符該母語音模 組排列順序之可能組合。承前所述,於本實施例中,該語] 7458 微星. (1 Page 11 1242729 V. Description of the invention (8) In the embodiment, let the three parts "f", "a" and "t" be the mother voice modules "A", "A", " B "and" C ". That is, the module" ABC "composed of the mother voice module stands for" fat ". The storage module 1 4 'is used to divide the word into modules 12 The at least one mother voice module and the arrangement order of the mother voice module corresponding to the input signal are stored in the database 14. According to the foregoing description, in this embodiment, the user uses the input unit 10 The input group of words may be divided into three parts, such as "f", "a", and "t". Therefore, the storage module 14 immediately follows the three mother voices such as "A", "B", and "C" The module and the module "ABC" are stored in the database 14. In addition, during the establishment of the database 14, the user recovers the sequence relationship between the input sound and the sound through the input unit 10 (Sequentia 1 cue) longer "fat" (fat between fa) and short sequence relationship (fa-1 between fa-1) "Fat". Among them, it is assumed that the module corresponding to the "fat" of the prolonged sound between fa is "DC", and the module corresponding to the "fat" that has a shorter sequence between fa-1 is "fat". E ". The user will regard the phrase corresponding to the" ABC "," DC "and" E "modules as" fat ". The speech recognition module 18 is used by the user to pass When the input unit 10 inputs a voice signal, the voice signal is divided into at least one mother voice module to be identified according to a preset reference by the user, and a search is performed in the database 14 for whether the mother voice module to be identified is allowed. Group arrangement order data, if available, retrieve the arrangement order data; if not, list possible combinations that allow the arrangement order of the mother voice module. As mentioned earlier, in this embodiment, the phrase

17458微星.ptd 第12頁 1242729 五、發明說明(9) 音辨識模組1 8之語音訊號分割方式與前述之該語詞分割模 組1 2相同,透過前述之分割技術,得將使用者透過該輸入 單元1 0所輸入待辨識的語音訊號,分割成至少一待辨識母 語音模組。 此時,若使用者輸入一組語詞「f at」,則該語音辨 識模組18將會分割為「f」、「a」以及「t」等三個待辨 識母語音模組,亦即三個待辨識母語音模組模組「A」、 「B」以及「C」所組成之待辨識模組「ABC」。之後,再 透過動態時間校正之技術,搜尋該資料庫1 4中是否有儲存 允符該待辨識模組「ABC」之語詞資料,若有則辨識出使 用者透過該輸入單元1 0所輸入之語詞係為「f a t」;若無 相允符之母語音模組排列順序,則將與該些母語音模組相 符之可能組合自該資料庫1 4檢索出來,俾供使用者進一步 的確認其所輸入之語詞資料為何。據此,使用者可以依據 所列出之可能進行排列順序資料之建立。 需特別說明者,若使用者透過該輸入單元1 0所輸入之 「fat」係為f-a之間拉長音之「fat」或f-a-1之間前後順 序較短之「f a t」。則該語音辨識模組1 8所辨釋出的模組 將會分別是「DC」或「E」。承前所述,由於使用者於該 資料庫1 0建立之過程中,已將前述拉長音或短音之 「fat」模組「DC」或「E」所對應之語詞組均設定為 「fat」。故縱使使用者透過該輸入單元1 0所輸入的並非 標準之「f a t」語音資料則該語音辨識模組1 8仍得辨識出 該語詞組「fat」。17458 微星 .ptd Page 12 1242729 V. Description of the invention (9) The voice signal segmentation method of the voice recognition module 18 is the same as that of the word segmentation module 12 described above. Through the aforementioned segmentation technology, the user must pass The voice signal to be recognized input by the input unit 10 is divided into at least one mother voice module to be recognized. At this time, if the user inputs a set of words "f at", the speech recognition module 18 will be divided into three to-be-recognized mother speech modules such as "f", "a", and "t", that is, three A to-be-recognized mother voice module "A", "B" and "C" are composed of the to-be-recognized module "ABC". Then, by using the technology of dynamic time correction, it is searched in the database 14 whether there is stored word data that allows the module "ABC" to be identified, and if so, it is identified that the user inputs through the input unit 10 The word system is "fat"; if there is no matching mother voice module arrangement order, the possible combinations that match the mother voice modules are retrieved from the database 14 for further confirmation by the user What words are entered? Based on this, the user can create the sequence data according to the listed possibilities. It should be noted that if the “fat” inputted by the user through the input unit 10 is “fat” with a long tone between f-a or “f a t” with a shorter sequence between f-a-1. Then the modules identified by the speech recognition module 18 will be "DC" or "E", respectively. According to the previous description, during the establishment of the database 10, the user has set the corresponding phrase of the "fat" module "DC" or "E" of the aforementioned long or short note to "fat" . Therefore, even if the user inputs non-standard "f at" voice data through the input unit 10, the voice recognition module 18 still has to recognize the phrase "fat".

17458微星.ptd 第13頁 1242729 五、發明說明(10) 另一方面,若使用者建立了對應語詞組「fact」之另 一模組「ABFC」。則當使用者透過該輸入單元1 0輸入 「fact」,然因使用者之發音不標準而未將該「c」音的 母語音模組確實辨識出時,該語音辨識模組1 8復得藉由如 辨識機率南低等一加權值(w e i g h t e d v a 1 u e)機制,以判 定該不標準的語音所對應之模組為「ABC」或「ABFC」, 若「ABC」之辨識機率較高則該語音辨識模組1 8則會將使 用者所輸入之語音辨識成對應「ABC」模組之語詞組 「fat」。 請參閱第2圖,其中顯示本發明之語音資料庫建立與 辨識方法時之流程步驟: 於步驟S2 01中,令該語詞分割模組12將使用者透過該 輸入單元1 0所輸入之語音訊號,依據使用者預設之基準將 該語音訊號分割成至少一母語音模組。承前所述,於本實 施例中’當使用者於建立語音資料庫時,得透過該輸入單 元1 〇輸入一組語詞「f at」的類比語音訊號時,該語詞分 割模組1 2隨即將該轉換成數位訊號加以處理,並將其分割 成「f」、「a」以及「t」等三個部分。當分割完成後, 再將δ玄些不同的母語音模組將儲存於該資料庫1 4中。接著 進行步驟S 2 0 2。 八方、步驟S 2 0 2中,令該儲存模組丨4將該語詞分割模組1 2 所二出之該至少一母語音模組,以及對應該輸入訊號之 母=:板組排列順序儲存至該資料庫1 4中。承前所述,於 # K ^例中’該儲存模組1 4復得將使用者透過該輸入單元17458 微星 .ptd Page 13 1242729 V. Description of the invention (10) On the other hand, if the user creates another module "ABFC" for the corresponding phrase "fact". Then, when the user inputs "fact" through the input unit 10, but the mother voice module of the "c" sound is not recognized because the user's pronunciation is not standard, the voice recognition module 18 will have With a weightedva 1 ue mechanism such as low recognition probability, the module corresponding to the non-standard speech is determined to be "ABC" or "ABFC". If the recognition probability of "ABC" is high, then the The speech recognition module 18 will recognize the speech input by the user into the phrase "fat" corresponding to the "ABC" module. Please refer to FIG. 2, which shows the steps of the method for establishing and recognizing the voice database of the present invention: In step S2 01, the word segmentation module 12 is configured to direct the voice signal input by the user through the input unit 10 , The voice signal is divided into at least one mother voice module according to a preset reference set by the user. According to the foregoing description, in this embodiment, when the user is creating a voice database, the analog voice signal of a group of words "f at" can be input through the input unit 10, and the word segmentation module 12 will then This is converted into a digital signal for processing, and it is divided into three parts: "f", "a", and "t". After the segmentation is completed, the different mother voice modules of δ are stored in the database 14. Then, step S202 is performed. In all directions, in step S202, the storage module 丨 4 is configured to store the at least one mother voice module which is separated from the word segmentation module 1 2 and the mother corresponding to the input signal =: the board group is stored Go to the database 14. According to the previous description, in # K ^ 例 ’the storage module 1 4 has to pass the user through the input unit

第14頁 1242729 五、發明說明(11) 1 0所輸入之三個母語音模組「f」、「a」以及「t」所排 列出「f a t」之順序加以儲存至該資料庫1 4中,接著進行 步驟S 2 0 3。 於步驟S 2 0 3中,令該語音辨識模組1 8於使用者透過該 輸入單元1 0輸入語音訊號時,依據使用者預設之基準將該 語音訊號分割成至少一待辨識母語音模組。承前所述,於 本實施例中,該語音辨識模組1 8之語音訊號分割方式與前 述之該語詞分割模組1 2相同,透過前述之分割技術,得將 使用者透過該輸入單元1 0所輸入待辨識的語音訊號,分割 成至少一待辨識母語音模組。承前所述,若使用者輸入一 組語詞「f a t」,則該語音辨識模組1 8將會分割為「f」、 「a」以及「t」等三個待辨識母語音模組,接著進行步驟 S2 0 4 ° 於步驟S 2 0 4中,令該語音辨識模組1 8搜尋該資料庫1 4 中是否有允符該待辨識母語音模組排列順序資料。承前所 述,於本實施例中,該語音辨識模組1 8透過動態時間校正 之技術,搜尋該資料庫1 4中是否有儲存允符該「f a t」排 列順序之語詞資料,若是,則進至步驟S 2 Ο 5 ;若否,則進 至步驟S 2 0 6。 於步驟S 2 0 5中,令該語音辨識模組1 8辨識出使用者透 過該輸入單元1 0所輸入之語詞係為「f a t」。 於步驟S 2 Ο 6中,令該語音辨識模組1 8將與該些母語音 模組相符之可能組合自該資料庫1 4檢索出來,俾供使用者 進一步的確認其所輸入之語詞資料為何。Page 14 1242729 V. Description of the invention (11) The order of the three mother voice modules "f", "a" and "t" entered in "fat" is stored in the database 14 Then, proceed to step S203. In step S203, the voice recognition module 18 is caused to divide the voice signal into at least one mother voice mode to be recognized when the user inputs a voice signal through the input unit 10 according to a preset reference of the user. group. According to the foregoing, in this embodiment, the voice signal segmentation method of the speech recognition module 18 is the same as that of the word segmentation module 12 described above. Through the aforementioned segmentation technology, the user can pass the input unit 1 0 The input voice signal to be recognized is divided into at least one mother voice module to be recognized. According to the previous description, if the user enters a set of words "fat", the speech recognition module 18 will be divided into three to-be-recognized mother speech modules such as "f", "a", and "t", and then proceed Step S204: In step S204, the speech recognition module 18 is caused to search whether there is data in the database 14 that permits the arrangement order data of the mother speech module to be recognized. According to the foregoing description, in this embodiment, the speech recognition module 18 uses dynamic time correction technology to search whether there is word data in the database 14 that allows the "fat" arrangement order, and if so, enter Go to step S 2 0 5; if not, go to step S 2 0 6. In step S205, the speech recognition module 18 is made to recognize that the word entered by the user through the input unit 10 is "f a t". In step S206, the speech recognition module 18 is made to retrieve possible combinations that match the mother speech modules from the database 14 for further confirmation by the user of the entered word data. Why.

17458微星.ptd 第15頁 1242729 五、發明說明(12) 綜上所述,本發明之語音資料庫建立與辨識方法以及 系統,除得以增加資料庫之樣本數量且不致於無限擴張資 料庫的語音樣本數量之前提下,增加語音訓練與辨識成功 之效率,復得節省建立個人語音特徵之時間。又,本發明 之語音資料庫建立與辨識方法以及系統復得結合文字轉語 音(Text To Speech; TTS)而成為互動式對話系統。 上述實施例僅為例示性說明本發明之原理及其功效, 而非用於限制本發明。任何熟習此項技藝之人士均可在不 違背本發明之精神及範疇下,對上述實施例進行修飾與變 化。因此,本發明之權利保護範圍,應如後述之申請專利 範圍所列。17458 微星 .ptd Page 15 1242729 V. Description of the invention (12) In summary, the method and system for establishing and identifying a voice database of the present invention, in addition to increasing the number of samples in the database and not extending the voice of the database infinitely The number of samples was raised earlier to increase the efficiency of speech training and recognition success, so as to save time in establishing personal speech features. In addition, the method and system for establishing and recognizing a speech database of the present invention combine text to speech (TTS) to become an interactive dialogue system. The above-mentioned embodiments are merely illustrative for explaining the principle of the present invention and its effects, and are not intended to limit the present invention. Anyone skilled in the art can modify and change the above embodiments without departing from the spirit and scope of the present invention. Therefore, the scope of protection of the rights of the present invention should be listed in the scope of patent application mentioned later.

17458微星.口七(] 第16頁 1242729 圖式簡單說明 【圖式簡單說明】 第1圖係為一方塊示意圖,用以顯示本發明之語音資 料庫建立與辨識系統之系統架構;以及 第2圖係為一流程圖,用以顯示本發明之語音資料庫 建立與辨識方法執行時之流程步驟。 1 個人電腦 10 輸入單元 12 語詞分割模組 14 資料庫 1 6 儲存模組 18 語音辨識模組17458 MSI. Mouth Seven () Page 16 1242729 Simple illustration of the diagram [Simplified illustration of the diagram] FIG. 1 is a block diagram showing the system architecture of the speech database creation and identification system of the present invention; and 2 The figure is a flowchart showing the steps of the method for establishing and identifying the speech database of the present invention. 1 Personal computer 10 Input unit 12 Word segmentation module 14 Database 1 6 Storage module 18 Speech recognition module

]7458微星.pt-d 第17頁] 7458 MSI.pt-d Page 17

Claims (1)

1242729 六、申請專利範圍 1. 一種語音資料庫建立與辨識方法,其得應用於一資料 處理裝置上,用以提供該資料處理裝置語音辨識之功 能,其包括: (1) 令一語詞分割模組將使用者透過一輸入單元 所輸入之語音訊號,依據使用者預設之基準將該語音 訊號分割成至少一母語音模組,並透過一儲存模組將 該母語音模組儲存於一資料庫中; (2) 令一儲存模組將對應該使用者所輸入之語音 訊號的母語音模組排列順序儲存於該資料庫中; (3) 令一語音辨識模組於使用者透過該輸入單元 輸入語音訊號時,依據使用者預設之基準將該語音訊 號分割成至少一待辨識母語音模組; (4) 令該語音辨識模組模組搜尋該資料庫中是否 有允符該待辨識母語音模組排列順序資料,若有,則 進至步驟(5);若否,則進至步驟(6); (5) 令該語音辨識模組擷取出允符該排列順序資 料;以及 (6) 令該語音辨識模組列出該允符該母語音模組 排列順序之可能組合。 2. 如申請專利範圍第1項之方法,復包括於該語詞分割模 組分割語音訊號前,令該語詞分割模組將所接收之類 比語音訊號轉換成數位訊號格式。 3. 如申請專利範圍第1項之方法,其中,該語詞分割模組 係分析語音訊號在頻譜上的分布關係。1242729 VI. Scope of patent application 1. A method for establishing and recognizing a speech database, which can be applied to a data processing device to provide the function of speech recognition of the data processing device, including: (1) ordering a word segmentation module The group divides a voice signal input by a user through an input unit, divides the voice signal into at least one mother voice module according to a preset reference of the user, and stores the mother voice module in a data through a storage module. In the database; (2) order a storage module to store the mother voice module arrangement order corresponding to the voice signal input by the user in the database; (3) order a voice recognition module in the user through the input When the unit inputs a voice signal, the voice signal is divided into at least one to-be-recognized parent voice module according to a preset reference of the user; (4) The voice recognition module module is caused to search whether there is a permit in the database Recognize the arrangement order data of the mother speech module. If there is, go to step (5); if not, go to step (6); (5) Make the speech recognition module extract the data that allows the arrangement order. Data; and (6) order the speech recognition module to list possible combinations that allow the arrangement order of the mother speech module. 2. If the method of item 1 of the patent application scope is included, before the word segmentation module divides the voice signal, make the word segmentation module convert the received analog voice signal into a digital signal format. 3. The method of item 1 in the scope of patent application, wherein the word segmentation module analyzes the distribution relationship of the speech signal on the frequency spectrum. 17458微星.卩士(1 第18頁 1242729 六、申請專利範圍 4. 如申請專利範圍第3項之方法,其中,該頻譜的分布關 係包括「頻率」與「時間」二維數據,利用二維影像 的邊緣偵測原理,以得到兩不相似語音片段之分界。 5. 如申請專利範圍第4項之方法,其中,該語音片段之分 界係為一可變動之門檻值,該門檻值會因語音資料和 環境的不同而有所變更,藉以鑑別出某一時間點與另 一時間點在頻率上的能量變化有顯著且超出門檻值的 表現,俾作為分割語詞之依據。 6. 如申請專利範圍第1項之方法,其中,該語詞分割模組 係以該語音資料之速度、能量及頻率其中之一者為分 割依據。 7. 如申請專利範圍第1項之方法,復包括將不同之母語音 模組排列順序設定為對應相同之語詞組。 8. 如申請專利範圍第1項之方法,其中,該語音辨識模組 係透過動態時間校正(D y n a m i c T i in e W a r p i n g ; D T W )之 技術,與該資料庫中之該些母語音模組以及特定之母 語音模組排列順序進行比對,藉以獲得最接近使用者 所輸入語音内容之結果。 9. 如申請專利範圍第1項之方法,其中,該語音辨識模組 模組係透過一預設之加權值作為搜尋該資料庫中是否 有允符該待辨識母語音模組排列順序資料之判斷基 準。 1 0 .如申請專利範圍第1項之方法,其中,該資料處理裝置 可為採用個人電腦相容及嵌入式其中之一資料處理系17458 MSI. 卩 士 (1 Page 18 1242729 VI. Patent Application Scope 4. For the method of patent application No. 3, where the distribution relationship of the frequency spectrum includes two-dimensional data of "frequency" and "time", using two-dimensional data The edge detection principle of the image is used to obtain the boundary between two dissimilar speech segments. 5. For the method in the scope of patent application No. 4, wherein the boundary of the speech segment is a variable threshold, the threshold will be The voice data and the environment vary, so as to identify that the energy change in frequency between one time point and another time point has a significant performance that exceeds the threshold value, and is used as the basis for segmenting words. 6. If applying for a patent The method of the first item in the scope, wherein the word segmentation module is based on one of the speed, energy, and frequency of the voice data. 7. If the method of the first item in the scope of the patent application is applied, the method includes the different The arrangement order of the mother speech module is set to correspond to the same language phrase. 8. For the method of the first scope of the patent application, wherein the speech recognition module is corrected by dynamic time Dynamic T i in e W arping; DTW) technology to compare the mother voice module and the specific mother voice module arrangement order in the database, in order to obtain the closest to the user's input voice content Result. 9. The method of item 1 of the patent application scope, wherein the voice recognition module module searches for a predetermined weighting value in the database as to whether or not the arrangement order of the mother voice module to be identified is allowed. The criterion for judging the data. 10. If the method of the first item of the scope of patent application, the data processing device may be a data processing system compatible with a personal computer and embedded 17458 微星.ptd 第19頁 1242729 六、申請專利範圍 統。 1 1.如申請專利範圍第1 0項之方法,其中,該個人電腦相 容資料處理系統可為工作站、個人電腦、筆記型電 腦、液晶電腦、平板電腦、掌上型電腦、個人數位助 理以及行動電話其中之一者。 1 2. —種語音資料庫建立與辨識系統,其得應用於一資料 處理裝置上,用以提供該資料處理裝置語音辨識之功 能,其包括: 一語詞分割模組,其係用以將使用者透過一輸入 單元所輸入之語音訊號,依據使用者預設之基準將該 語音訊號分割成至少一母語音模組,並將該母語音模 組儲存於一資料庫中; 一儲存模組,其係用以將該語詞分割模組所分割 出之該至少一母語音模組,以及對應該輸入訊號之母 語音模組排列順序儲存至該資料庫中;以及 一語音辨識模組,其係用以於使用者透過該輸入 單元輸入語音訊號時,依據使用者預設之基準將該語 音訊號分割成至少一待辨識母語音模組,並搜尋該資 料庫中是否有允符該待辨識母語音模組排列順序資 料,若有,則擷取出該排列順序資料;若否,則列出 該允符該母語音模組排列順序之可能組合。 1 3 .如申請專利範圍第1 2項之系統,其中,該語詞分割模 組復包括一類比數位換單元,用以將使用者所輸入之 類比語音訊號轉換成數位訊號。17458 MSI.ptd Page 19 1242729 6. Scope of Patent Application. 1 1. The method according to item 10 of the patent application scope, wherein the personal computer compatible data processing system may be a workstation, a personal computer, a notebook computer, an LCD computer, a tablet computer, a palmtop computer, a personal digital assistant, and a mobile Call one of them. 1 2. A speech database establishment and recognition system, which can be applied to a data processing device to provide the speech recognition function of the data processing device, includes: a word segmentation module, which is used to use The voice signal input by an input unit is divided into at least one mother voice module according to a preset reference by the user, and the mother voice module is stored in a database; a storage module, It is used to store the at least one mother speech module divided by the word segmentation module and the arrangement order of the mother speech modules corresponding to the input signal to the database; and a speech recognition module, which is When the user inputs a voice signal through the input unit, the voice signal is divided into at least one to-be-recognized mother voice module according to a preset reference of the user, and a search is performed in the database to see if the to-be-recognized mother is allowed. Voice module arrangement order data, if available, retrieve the arrangement order data; if not, list possible combinations that allow the arrangement order of the mother voice module. 13. The system of item 12 in the scope of patent application, wherein the word segmentation module includes an analog digital conversion unit for converting an analog voice signal input by a user into a digital signal. 17458微星.ptd 第20頁 1242729 六、申請專利範圍 1 4 .如申請專利範圍第1 2項之系統,其中,該語詞分割模 組係分析語音訊號在頻譜上的分布關係。 1 5 .如申請專利範圍第1 4項之系統,其中,該頻譜的分布 關係包括「頻率」與「時間」二維數據,利用二維影 像的邊緣偵測原理,以得到兩不相似語音片段之分 界。 1 6 .如申請專利範圍第1 5項之系統,其中,該語音片段之 分界係為一可變動之門檻值,該門檻值會因語音資料 和環境的不同而有所變更,藉以鑑別出某一時間點與 另一時間點在頻率上的能量變化有顯著且超出門檻值 的表現,俾作為分割語詞之依據。 1 7 .如申請專利範圍第1 2項之系統,其中,該語詞分割模 組係以該語音資料之速度、能量及頻率其中之一者為 分割依據。 1 8 .如申請專利範圍第1 2項之系統,其中,該語音辨識模 組係透過動態時間校正(D y n a m i c T i m e W a r p i n g ; D T W ) 之技術,與該資料庫中之該些母語音模組以及特定之 母語音模組排列順序進行比對,藉以獲得最接近使用 者所輸入語音内容之結果。 1 9 .如申請專利範圍第1 2項之系統,其中,該語音辨識模 組模組係透過一預設之加權值作為搜尋該資料庫中是 否有允符該待辨識母語音模組排列順序資料之判斷基 準。 2 0 .如申請專利範圍第1 2項之系統,其中,該資料處理裝17458 MSI.ptd Page 20 1242729 VI. Scope of Patent Application 1 4. For the system of No. 12 scope of patent application, the word segmentation module analyzes the distribution relationship of speech signals on the frequency spectrum. 15. The system according to item 14 of the scope of patent application, wherein the distribution relationship of the frequency spectrum includes two-dimensional data of "frequency" and "time", and the edge detection principle of the two-dimensional image is used to obtain two dissimilar speech segments. Demarcation. 16. If the system of item 15 of the scope of patent application, the delimitation of the voice segment is a variable threshold value, and the threshold value may be changed due to different voice data and environment, so as to identify a certain The energy change in frequency between one time point and another time point has a significant performance that exceeds the threshold value, and 俾 is used as the basis for segmenting words. 17. The system according to item 12 of the scope of patent application, wherein the word segmentation module is based on one of the speed, energy and frequency of the speech data. 18. The system of item 12 in the scope of patent application, wherein the voice recognition module is based on the technology of Dynamic Time Correction (DTW) and the mother voice modes in the database. The group and the specific mother voice module arrangement order are compared to obtain the result closest to the user's input voice content. 19. If the system of item 12 in the scope of patent application, wherein the voice recognition module module searches for a predetermined weighted value in the database as to whether the order of the mother voice module to be recognized is allowed Judgment basis of information. 2 0. The system of item 12 in the scope of patent application, wherein the data processing device 17458微星.ptd 第21頁 1242729 六、申請專利範圍 置可為採用個人電腦相容及嵌入式其中之一資料處理 系統。 2 1 .如申請專利範圍第2 0項之系統,其中,該個人電腦相 容資料處理系統可為工作站、個人電腦、筆記型電 腦、液晶電腦、平板電腦、掌上型電腦、個人數位助 理以及行動電話其中之一者。 2 2 .如申請專利範圍第1 2項之系統,其中,該資料庫係為 一關聯式資料庫。17458 MSI.ptd Page 21 1242729 VI. Scope of Patent Application The device can be a personal computer compatible and embedded data processing system. 2 1. If the system of the scope of patent application No. 20, wherein the personal computer compatible data processing system can be a workstation, personal computer, notebook computer, LCD computer, tablet computer, palmtop computer, personal digital assistant and mobile Call one of them. 2 2. The system according to item 12 of the scope of patent application, wherein the database is a relational database. 17458微星邛1:(1 第22頁17458 MSI 邛 1: (1 page 22
TW93101136A 2004-01-16 2004-01-16 Speech database establishment and recognition method and system thereof TWI242729B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW93101136A TWI242729B (en) 2004-01-16 2004-01-16 Speech database establishment and recognition method and system thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW93101136A TWI242729B (en) 2004-01-16 2004-01-16 Speech database establishment and recognition method and system thereof

Publications (2)

Publication Number Publication Date
TW200525384A TW200525384A (en) 2005-08-01
TWI242729B true TWI242729B (en) 2005-11-01

Family

ID=37022579

Family Applications (1)

Application Number Title Priority Date Filing Date
TW93101136A TWI242729B (en) 2004-01-16 2004-01-16 Speech database establishment and recognition method and system thereof

Country Status (1)

Country Link
TW (1) TWI242729B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8694314B2 (en) 2006-09-14 2014-04-08 Yamaha Corporation Voice authentication apparatus
TWI587281B (en) * 2014-11-07 2017-06-11 Papago Inc Voice control system and its method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8694314B2 (en) 2006-09-14 2014-04-08 Yamaha Corporation Voice authentication apparatus
TWI587281B (en) * 2014-11-07 2017-06-11 Papago Inc Voice control system and its method

Also Published As

Publication number Publication date
TW200525384A (en) 2005-08-01

Similar Documents

Publication Publication Date Title
JP7062851B2 (en) Voiceprint creation / registration method and equipment
US20230019978A1 (en) Automatic speech recognition correction
US11620988B2 (en) System and method for speech personalization by need
CN110223695B (en) Task creation method and mobile terminal
US20220246136A1 (en) Multilingual neural text-to-speech synthesis
CN107112006B (en) Neural network based speech processing
US11093110B1 (en) Messaging feedback mechanism
US20190259388A1 (en) Speech-to-text generation using video-speech matching from a primary speaker
TWI711967B (en) Method, device and equipment for determining broadcast voice
US10811005B2 (en) Adapting voice input processing based on voice input characteristics
WO2020238209A1 (en) Audio processing method, system and related device
WO2021004481A1 (en) Media files recommending method and device
WO2017127296A1 (en) Analyzing textual data
TW200926139A (en) Grapheme-to-phoneme conversion using acoustic data
CN103165131A (en) Voice processing system and voice processing method
WO2020024620A1 (en) Voice information processing method and device, apparatus, and storage medium
US10699706B1 (en) Systems and methods for device communications
US20210232776A1 (en) Method for recording and outputting conversion between multiple parties using speech recognition technology, and device therefor
CN104252464A (en) Information processing method and information processing device
CN110517668B (en) Chinese and English mixed speech recognition system and method
CN106713111B (en) Processing method for adding friends, terminal and server
CN113627196A (en) Multi-language conversation robot system based on context and Transformer and conversation method thereof
WO2021098318A1 (en) Response method, terminal, and storage medium
CN111062221A (en) Data processing method, data processing device, electronic equipment and storage medium
WO2015188454A1 (en) Method and device for quickly accessing ivr menu

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees