TW200921641A - Speech recognition system - Google Patents

Speech recognition system Download PDF

Info

Publication number
TW200921641A
TW200921641A TW96142713A TW96142713A TW200921641A TW 200921641 A TW200921641 A TW 200921641A TW 96142713 A TW96142713 A TW 96142713A TW 96142713 A TW96142713 A TW 96142713A TW 200921641 A TW200921641 A TW 200921641A
Authority
TW
Taiwan
Prior art keywords
speech recognition
voice
recognition system
speech
user
Prior art date
Application number
TW96142713A
Other languages
Chinese (zh)
Other versions
TWI351021B (en
Inventor
Jui-Chang Wang
Original Assignee
Jui-Chang Wang
Wang Chung Ping
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jui-Chang Wang, Wang Chung Ping filed Critical Jui-Chang Wang
Priority to TW096142713A priority Critical patent/TWI351021B/en
Publication of TW200921641A publication Critical patent/TW200921641A/en
Application granted granted Critical
Publication of TWI351021B publication Critical patent/TWI351021B/en

Links

Landscapes

  • User Interface Of Digital Computer (AREA)

Abstract

A speech recognition system comprises at least a speech recognition engine and a display device that contains a signal status interface and a textual interface. The signal status interface is used to show a recording status, a speech processing status, or a complete speech recognition status based on waveforms display. The textual interface is used to show words of the speech recognition results. Two sets of commands are connected with each waveform unit on the signal status interface and each word unit on the textural interface, respectively, in order to allow users to correct the recognition errors or to adjust the speech recognition system.

Description

200921641 九、發明說明: 【發明所屬之技術領域】 _本發明係有關於-種語音辨識系統,尤指—種可以波形來顯 不錄音狀態、語音處雜態或語音_細狀態,且於波形和文 字顯示介社具有至少-個__翻,可供朗者選取指令 以更正語切識錯誤,或調整語音辨識細。本發_用於^ 型電腦、筆記型電腦、家用多媒體系統、電視、则、會立系统 手機或個人触祕轉圖脚歸介面的電子裝置之音辨 系統。 σ 【先前技術】 於現今諸多f子裝置之使财面,語音觸技術之發展為使 用者闢開—更為便捷之路徑。無論桌上型電腦、筆記型電腦、手 機、個人數位助理雜"子裝置,其輸人皆必須仰她覺與肢 ^之合作,以達到操控之目的。舉例而言,當使用者操控電腦時, ,、必須藉由鍵盤、滑鼠或其㈣屬控繼置之協助,以完成指令 之輸入或有使用觸碰式躲以簡化輸碌序,細,由於發幕 ’.、、、積係為有限,且仍需使用手指進行操作,仍無法達到最大 之=利11上迷問題對於—般人而言或許僅造成不便,但對於肢 體P视、患有輕肌肉疾絲或失魏覺之使用者而言,將使其 難乂操作上述%子裝置。而語音賴技術正可驗此等問題。 -曰辨識之應用方面,使用者僅需藉由音訊輸人裝置,例如 200921641 麥克風,將其語音輪入徂 音相對應之文字,或進、、統顺,喊輸出欲輸入語 令操作。 ^軸狀好韻、絲直接進行指 輸入&置將簦統時’如上所述,使用者必須經由一音訊 :、:中二:並錄製’然後開始辨識過程。於錄音以及辨 二账二Μ因素將影響最終語音辨識之結果,例如所使 α卜曰、置_、錄音之環境、與音訊輸人裝置之距離等。 ;=錄音以及辨識過程之監控,實有其需要。關於此,習 Γ 不關縣分職灣音狀態以及辨識狀態,或 有利用一圖像之變仆以gg 』不錄音或辨識狀態。然而,於顯示狀態 5 '”、献映峰音或觸結果品質之料雜音過程是否 成功。 此外,對於語音辨識之結果,習知技術或有提供若干可根據 所做之輕功能,_泰伟俩對整體語音辨識結果 所§又,因此往往無法僅針對語音辨識結果中之某部份進行調整, 以回饋進峨好_版魏M_懈烟使用者 之需求,箱言’ _使财料鲜_之發音料特殊之 腔調,若無法騎轉字、狀__結麵如饋、調整, 則勢必將綠提辟合_使财之語麵m 弱其實際之魏。 域 本發明人基射憤切麵街之舰,_語讀識技術得 6 200921641 以更為便利、廣泛地被使用,且得以根據使用者之語音辨識結果, 回饋並δ周整所使用之語音辨識系統,使該語音辨識系統能更符合 侧使用者之需求。緣此,本發日狀經不輯思,财本發明之 產生。 【發明内容】 本發明之一目的在於提供一種利用波形以代表一錄音狀態、 -語音辨識進行愤態、或—語音_完成狀態之語音辨識系 統,以提供使用者直接以語音訊號波行來監控錄音品質、語音處 理速度以及語音辨識結果品質。 本發明之另一目的在於提供一種具備錯誤更正回饋調整機制 之語音辨識純’以提供仙者可有效的更正語音_錯誤,或 可回饋調整語音辨識系統。 為達以上目的’本發明係提供一種語音辨識系統,包括至少 —語音辨識引擎以及-顯稀置,且於該顯示裝置上設有—_ =不介面以及-文字輸出介面。其巾,該職指示介面係藉由^ :代表使用者輸人之語音職,並顯示錄音狀態、語 中狀態或語音韻完餘態。而敎字細介面 =之文字結果’且該文字結果包括至少—個詞單元:: =之詞單元以⑽應之波科元分顯—_侧整選項^ 饋:Γ組包含至少-個回饋調整—^ 日”猎吾音辨識錯誤,或回饋調整語音辨識系統。 200921641 j述波形較細不_色分別代表錄音狀態、語音辨識進行 中狀悲以及§吾音辨識完成狀態。 -上述輸入之。„音.虎在辨識完成後,將辨識結果輪出於文字 輸出介面上’且各詞單元較佳與訊號指示介面上對應的波形單元 相對齊。上_單元和對應的波料元較佳以_顏色代表該詞 =音辨識品質。語音辨識品_分為三種程度,包括品質良好、 口口貝不良以及扣質报差需要嚴格檢視並更正的狀態,並分別以不 同顏色標示之。 上述訊號指示介面的波形單元與一個回饋調整選項組相連 其回綱整選項組包含至少—個選項相連結,供使用者選取 k ’以有效的更正語音觸錯誤,或回饋罐語音辨識系統。 上述文子輸出介面的詞單元與一個回饋調整選項組相連結; ^回饋調整選項組包含至少一個選項相連結,供使用者選取指 令,以有效的更正語音辨識錯誤,或回饋調整語音辨識系統。 為對於本發明之特點與作用能有更深入之瞭解,兹藉實施例 配合圖式詳述於後。 【實施方式】 弟1圖係為本發明語音辨識系統之示意圖。如圖所示,本發明 之種5吾音辨識系統係包括至少一語音辨識引擎10以及一顯示裝 置20,且於該顯示裝置20上設有一訊號指示介面30以及一文字 輸出;|面40。其中,該訊號指示介面30係藉由波形32代表使用 200921641 者輸入之語音崎’細示錄音狀態以及語音觸狀態。而該文 字輸出介面4Q,則係用於顯示語音辨識結果之文字42,而該文字 結果包括至少-個解元。如圖所示,顯示於訊號指示介面%上 之波形32係用於顯示使用者輸入之語音訊號,而顯示於文字輸出 介面40上之文字42則為上述語音訊號辨識後所得之結果。 此外,本發明語音辨識系統之顯示裝置2〇係可為用於桌上型 電腦、筆記型電腦、家用多媒體系統、電視、DVD、影音系統、 手機或個人數位助_電子裝置之顯示螢幕,或可連結傳送出影 像訊號之顯示螢幕,或遙控器上之顯示螢幕。 第2圖為本發明第—實施例之—示意圖,其係顯示使用者之 錄音狀態。如騎示,於錄音過程時,使用者經由—音訊輸入裝 置(未顯示於圖中,例如一麥克風)輸入語音於語音辨識系統内,其 輸入之語音訊號將以波形32顯示於訊號指示介面3〇上。波形之 使用係具有兩個優勢:其-,使用者於錄音過程可經由觀察波形 之變化而獲知是否已成魏將其語音喊輸人。於錄音過程可能 由於某些原因使得使用者之語音訊號實際上並未順利輸入,例如 語音訊號輸入裝置未啟動、語音訊號輸入裝置與設有語音辨識系 統之電子裝置接觸不良等原目。此時,使用者可藉由觀察波形之 變化,而作出即時之反應,以避免時間上不必要之浪費。其二, 根據波形之形狀,使用者可即時大略判別語音訊號之輸入品質, 而作出適當之調整。舉例而言,環境雜訊之干擾、所使用語音輸 200921641 入裝置之靈敏度甚或使用者使用語音輸入裝置之方式等皆可能影 響輸入語音訊號之品質,若能於錄音階段即掌握並排除某些潛在 影響語音訊號輸入品質之因素,而能輸入較佳品質之語音訊號, 其對於之後的§吾音辨識過程將有不可忽視之助益。 如上所述,本發明之該訊號指示介面3〇係藉她形^以顯 不錄音狀態以及語音觸狀態;其巾,語音觸狀g更包括語音 辨識進行情態以及語音韻完錄態。此外,代表錄音狀態、 語音辨識進行巾狀態以及語音觸完成雜之波職分別以不同 顏色表示,藉以·制者從視覺上即可麵#下之處理狀態、 語音辨識品質、甚或語音辨識之速度。 當使用者所輸入之語音訊號正在進行辨識中時,上述在訊號 指讀面30中之訊號波形32,將改以不同顏色顯示出已經處理的 訊號波形,以標示出語音觸的進程。換言之,於開始時,使用 者輸入之語音訊號以錄音狀態的顏色顯示;語音辨識程序開始 後,處理過之語音訊號就改以語音辨識進行中狀態之顏色顯示; 判全部輸人之語音罐觸糾之後,就㉔語音辨識完成狀 悲之顏色顯心某些詞單元為良好辨識品f之顏色,某些詞單元 為不良辨識品質之齡,以及某些詞單构_識品質之顏色。 如第3圖所示’其中實線之波形功係為已完成語音辨識過 程者,而虛狀邮2㈣料完成語音_過⑽。當語音 辨識完成之後’所有波縣全部"新顏色,顯示語音辨識已經 10 200921641 處理疋成。稍後將有比較詳細的描述。 田使用者將5吾音訊號輸入且所輸入之語音訊號亦完成辨識 後,辨識結果的最佳候選詞單元42()將逐—顯示於上述之文字輸 出)丨面4〇上。如第4圖所示,使用者所輸入之語音訊號以波形32 顯不’其中該波形32可進一步區分為至少一個波形單元32〇,每 -波形單元320係相對應於辨識結果中之一詞單元物,兩者之對 應關係將。周整到上下位置相互對齊的方式相對應。於本實施例 中,每一波形單元32〇係相對應於一辨識結果之詞單元42〇。由第 4圖可知’使用者係輸人『今日天氣如何』之語音訊號,其所顯示 之文子結果即為『今日天氣如何』;而對應於波形單元之輸入 語音訊號,即可能為辨識結果中『今日』之詞單元420。兩者位置 上下相互對齊,且以相同顏色代表其辨識品質。 該語音辨__為語音理狀㈣,峨指讀面3〇仍舊 如第4圖所示’而文字輸出介面40财輸出語音理解之結果;並 且’文字輸出介面40還是可以包含語音辨識結果文字,或是先將 其隱藏起來,等使歸選擇要顯示之後摘示出來。 、參考第4圖所不,語音辨識完成之後,其訊號和文字顯示係 為『今日』、『天氣』以及『如何』等以詞單元為單位的對齊段落, 其中每個詞單元皆以—種顏色顯示,以代表制單元之語音辨識 結果的品質。於本實關巾,每—_單元細綠色、黃色或紅 色顯不·其中’綠色標不該文字具有良好之語音辨識品質;黃色 200921641 警告遠文字有不良之語音觸品質;紅_代表該文字呈有恨差 的語音辨識品質,最好需要檢視並更正之語音辨識的結果。夢此, 可便於使岐錢從視覺B卩可分_各解搞語音辨識 品質之優劣謂便進行適當⑽誤狂和回饋純的處口理。 此外,上述每-個波形單元320係與回饋調整選項組相連結; 該選項組包含至少-個選項,供侧者重聽錄音,更正錄音,更 正語音辨酬錯誤,或_調整語音辨識系統。如第5圖=示, 每個波形單元320係與-第1饋調整選項組%相連結,該第— 回饋調整ϋ項組50包括至少—個回饋調整選項& ;於本實施例 中,該第-回饋調整選項組50係包括『播放』、『重錄』、『納⑽ 練』、『改為手寫輸入』、『改為鍵盤輪入』等回饋調整選項Μ。當 語音辨識完成後’使用者可藉由將顯示於該顯示裝置%上之一^ 鼠游標22移動至所欲回饋調整之波形單元32〇上,自動或你由= 鼠或觸控筆點擊,則可將與該波形單元32()相連結之第—回饋調 整選項組50顯秘該·裝置2G上。藉此,使財得選取所需 之回饋調整選項52 ’以更正語音辨識結果,或回饋調整語音辨識 系統。 舉例而言,當使用者發現波形有異,即可選取『播放』選項 52以播放該語音音訊,確定是否有雜訊干擾;或於語音辨識之文 字絲具有相當程度之偏差時,村藉由該『触』選項a重聽 之前輸入之語音訊號’進而找出原因所在,例如發音的偏差。若 12 200921641 確有問題,即可選取『重錄』選項52以重新輸入語音訊號。若其 語音辨識之文字輸出結果之偏差係起因於本身之發音習慣問題「 亦可選取『納人訓練』選項52 ’藉此銳化調整語音辨識祕以符 合該使用者之需求。在該語音辨_統調整到能夠清楚辨識出該 同之前,使用者可以決定改變輸入模式,例如選取『改為手寫輪 入』或『改為鍵盤輸入』,藉以將語音輸入模式切換至手寫或鍵盤 輪入模式,以完成輸入的目的。 上述文字顯示結果中之每個詞單元42〇亦與回饋調整選項組 2連結,· it選項組包含至少—個_調整選項相連結,供使用者 廷取,以更正語音觸的錯誤,或回綱整語音辨_統。如第6 圖所不’其中每個詞42G係與一第二回饋調整選項組如相連結, 及第—回饋調整選項組6Q包括至少—個回饋調整選項&。於本實 7例中’該第二選項組6〇係包括『列出下一筆語音辨識候選詞』、 『依語音她度優先列出語音辨識候翻』、『依詞彳目連性優先列 出候選詞』、『列出全部候選詞』、『改為手寫輸入』以及『改為鍵 盤輪入』物饋調整選項62。於語音辨識完錢,使用者可藉由 將該滑鼠游標22移動至所欲回綱整之解元·上,自動或經 由^氣或觸控筆點擊’射將與該詞單元減結之第二回饋 调整選項組60顯示於該顯示裝置2〇上。藉此,使用者可藉由^ 取所而之回饋調整選項62 ’而對語音辨識結果進行回饋調整。 由於發音問題,根據使用者所輸入之語音訊號所獲得之語音 13 200921641 辨識文字結果可能大相逕庭。以第6圖為例,使用者所朗讀之語 ^係為『我要吃飯』,隨著發音習慣不同,辨識所得之結果亦可能 Ik之不同而根據使用者所輸入之語音訊號,本發明之視覺回饋 系統係提供仙者姆於从語音赠U之概條音辨識近 似結果’供其選取。而近似結果則可經由選取上述第二回饋調整 選項組60中之不同回饋調整選項62以決定之。舉例而言,藉2 遠取『下—軸選詞』之選項62,使用者可獲得下—筆之候選詞; 猎由選取『語音她度優先』之翻62,使用者可獲得語音上最 為近似之結果;藉由選取『詞相連性優先』之選項62,使用者可 獲得根據前後詞的相連關係找出最為可能的候選詞;或藉由選取 『列出全部候選詞』之選項62,使用者可列出所有的語音辨識候 選詞。或者’制者也可以在此處顧其他的輸人模式,例如選 取改為手寫輸入』或『改為鍵盤輸入』,藉以將語音輸入模式切 換至手寫或鍵盤輸入模式,以完成輸入的目的。 因此’本發明具有以下優點: 1、 藉由本發明所提供之利用波形代表使用者語音訊號之語音辨識 系統,使用者可即時判斷錄音過程是否成功以及所輸入語立 號之品質。 2、 藉由本發明所提供之變化波形顏色之語音辨識系統,使用者得 以便利地監控語音處理的進度以及語音辨識結果的品質。 3、 藉由本發明所提供之一種語音辨識系統,使用者得以詞為單元 14 200921641 針對所輸入之語音訊號以及語音辨識結果之文字進行錯誤更正 或系統之回饋調整,以便利的完成文字輸入的工作,或持續改 善該語音辨識系統之效能。 綜上所述,本創作確實可達到預期之目的,提供一種語音辨 ^系統’其射便於使用者監絲音過程是否成功、語音訊號品 質、語音觸過程的進度以及觸結果的品f,並可供使用者便 利的更正語音觸的錯誤,或_語音辨_統,以促進該 語音辨識纽之魏。其極錢_狀價值,纽法提出專利 申請。 又上述綱與圖式僅是本發明實施例之描述,凡熟於此業技 為之人士,仍可做等效的局部變化與修飾,而未脫 術手段與精神。 L圖式簡單說明】 幻圖係本發明語音辨識系統之—示意圖。 第2圖本發明語音辨識系統之一 一 第3圖本發明語音辨識系統之第-實施:之: ^圖本發明語音辨識系統之第一實施例之另―::。 第5圖本發明語音辨識系統之 ^ 6 FI p 實%例之—使用狀態圖。 第6圖本發明語音辨識 元之弟^例之另—使用狀態圖 15 200921641 【主要元件符號說明】 . 語音辨識引擎10 文字42 顯示裝置20 詞單元420 滑鼠游標22 第一回饋調整選項組50 訊號指示介面30 第一回饋調整選項52 波形 32、321、322 第二回饋調整選項組60 f 波形單元320 文字輸出介面40 第二回饋調整選項62 16200921641 IX. Description of the invention: [Technical field to which the invention pertains] _ The invention relates to a speech recognition system, in particular to a waveform that can be used to display a state of no recording, a state of speech, or a state of speech, and a waveform And the text display agency has at least one __ flip, which can be used by the Lang select instruction to correct the error in the correct language, or adjust the speech recognition fine. This is a sound system for electronic devices such as a computer, a notebook computer, a home multimedia system, a television, a conference system, a mobile phone, or a personal touch screen. σ [Prior Art] In today's many financial devices, the development of voice touch technology is a more convenient way for users to open up. Regardless of the desktop computer, notebook computer, mobile phone, personal digital assistant, and other sub-devices, the losers must cooperate with her and the limbs to achieve the purpose of manipulation. For example, when the user manipulates the computer, it must be assisted by the keyboard, the mouse or its (4) control relay to complete the input of the instruction or use the touch type to simplify the order of the input. Since the opening screen '., ', and the system are limited, and still need to use the finger to operate, still can not reach the maximum = profit 11 problem for the average person may only cause inconvenience, but for the body P vision, suffering For users with light muscle dysfunction or loss of sensation, it will make it difficult to operate the above-mentioned % sub-device. And voice Lai technology can test these problems. - For the application of identification, the user only needs to input the voice corresponding to the voice by the audio input device, such as the 200921641 microphone, or enter, harmonize, and scream the output to input the command operation. ^Axis good rhyme, silk direct finger input & will be set as described above, the user must pass an audio :, : 2: and record ' then start the identification process. The factors of recording and discriminating will affect the result of the final speech recognition, such as the alpha dice, the _, the recording environment, and the distance from the audio input device. ;= Recording and monitoring of the identification process, there is a need. In this regard, Xi Yu does not turn off the county's sub-birth state and identify the state, or use an image to change the servant to gg 』 not to record or recognize the state. However, the success of the material noise process in the display state 5 '", the peaking sound or the quality of the result is also successful. In addition, for the result of speech recognition, the prior art may provide a number of light functions that can be based on the _ Taiwei The two have a § on the overall speech recognition result, so it is often impossible to adjust only some part of the speech recognition result, in order to give back to the needs of the user, the box language ' _ make the material The pronunciation of fresh _ is expected to be a special accent. If you can't ride the word, the __ 结 如 馈 、 、 、 调整 调整 调整 如 如 如 如 如 如 如 如 势 势 势 势 势 势 势 势 势 势 势 势 势 势 势 势 势 势 势 势 势 势 势The ship that angered the face of the street, _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ The identification system can better meet the needs of the side users. Therefore, the present invention has not been thought of, and the invention has been produced. [Invention] It is an object of the present invention to provide a waveform to represent a recording state, Speech recognition An insane or speech-completed voice recognition system to provide a user with direct voice signal to monitor recording quality, voice processing speed, and voice recognition result quality. Another object of the present invention is to provide an error correction The speech recognition of the feedback adjustment mechanism is pure to provide an accurate correction of the speech_error, or a feedback-adjustable speech recognition system. To achieve the above objectives, the present invention provides a speech recognition system, including at least a speech recognition engine and It is sparsely placed, and has -_ = no interface and - text output interface on the display device. The towel, the job indication interface is represented by ^: on behalf of the user to input the voice position, and displays the recording status, language The state or the phonological rhyme is completed. The 细 word fine interface = the text result 'and the text result includes at least one word unit:: = the word unit is (10) should be the branch of the branch - _ side option ^ Feed: The Γ group contains at least one feedback adjustment - ^ day "hunting voice recognition error, or feedback adjustment voice recognition system. 200921641 j The waveforms are thinner and not _ color respectively represents the recording state, the speech recognition is in the middle of the sadness and the § my voice recognition completion state. - Enter the above. „音.When the identification is completed, the identification result wheel is out of the text output interface' and each word unit is preferably aligned with the corresponding waveform unit on the signal indication interface. The upper _ unit and the corresponding wave element are preferably _ color represents the word = tone recognition quality. The voice recognition product _ is divided into three levels, including good quality, bad mouth and deduction, and the condition of the deduction is strictly checked and corrected, and is marked with different colors. The waveform unit of the indication interface is connected to a feedback adjustment option group, and the return option group includes at least one option for the user to select k 'to effectively correct the voice touch error, or to feed back the can speech recognition system. The word unit of the interface is linked with a feedback adjustment option group; ^ The feedback adjustment option group includes at least one option for the user to select an instruction to effectively correct the speech recognition error, or feedback to adjust the speech recognition system. The features and functions can be more deeply understood, and the embodiments are described in detail below with reference to the drawings. The figure 1 is a schematic diagram of the voice recognition system of the present invention. As shown in the figure, the voice recognition system of the present invention includes at least one voice recognition engine 10 and a display device 20, and a display device 20 is provided thereon. The signal indicating interface 30 and a text output; | face 40. wherein the signal indicating interface 30 represents the voice recording state and the voice touch state input by using the 200921641 by the waveform 32. The text output interface 4Q, The text 42 is used to display the speech recognition result, and the text result includes at least one solution element. As shown in the figure, the waveform 32 displayed on the signal indication interface % is used to display the voice signal input by the user, and is displayed. The text 42 on the text output interface 40 is the result of the above-mentioned voice signal recognition. In addition, the display device 2 of the speech recognition system of the present invention can be used for a desktop computer, a notebook computer, a home multimedia system, a display screen of a television, a DVD, an audio-visual system, a mobile phone or a personal digital assistant _ electronic device, or a display screen that can be connected to transmit an image signal, or The display screen on the controller. Fig. 2 is a schematic view of the first embodiment of the present invention, which shows the recording state of the user. For example, during riding, during the recording process, the user passes through the audio input device (not shown in In the figure, for example, a microphone is input into the speech recognition system, and the input speech signal is displayed on the signal indication interface 3 by the waveform 32. The use of the waveform has two advantages: - the user is in the recording process By observing the change of the waveform, it can be known whether the voice has been called into the voice. In the recording process, the voice signal of the user may not be successfully input for some reasons, for example, the voice signal input device is not activated, and the voice signal is The input device is in contact with the electronic device with the speech recognition system, etc. At this time, the user can make an immediate response by observing the change of the waveform to avoid unnecessary waste in time. Second, according to the shape of the waveform, the user can instantly determine the input quality of the voice signal and make appropriate adjustments. For example, the interference of environmental noise, the sensitivity of the voice input 200921641 into the device, or the way the user uses the voice input device may affect the quality of the input voice signal. If the recording stage can grasp and eliminate some potentials The factors affecting the quality of the voice signal input, and the ability to input a voice signal of a better quality, will have a significant contribution to the subsequent § voice recognition process. As described above, the signal indicating interface 3 of the present invention uses the form to display the state of the recording and the state of the voice touch; the towel, the voice touch g further includes the voice recognition for the modality and the voice finished recording state. In addition, the recording status, the speech recognition, the status of the towel, and the completion of the voice contact are respectively expressed in different colors, so that the system can visually face the processing state, the quality of speech recognition, or even the speed of speech recognition. . When the voice signal input by the user is being recognized, the signal waveform 32 in the signal reading surface 30 will display the processed signal waveform in different colors to indicate the progress of the voice touch. In other words, at the beginning, the voice signal input by the user is displayed in the color of the recording state; after the voice recognition program starts, the processed voice signal is changed to the color of the voice recognition in progress state; After the correction, the 24 speech recognition completes the sadness of the color. Some word units are the color of the good identification item f. Some word units are the age of bad identification quality, and some words are simple and the quality of the quality. As shown in Fig. 3, the waveform function of the solid line is the voice recognition process completed, and the virtual mail 2 (four) material completes the speech _ over (10). When the speech recognition is completed, 'All Bo County's all new color, showing the speech recognition has been processed. A more detailed description will be given later. After the field user inputs the 5 voice signal and the input voice signal is also recognized, the best candidate unit 42() of the identification result will be displayed one by one on the above-mentioned text output. As shown in FIG. 4, the voice signal input by the user is displayed as waveform 32. The waveform 32 can be further divided into at least one waveform unit 32, and each waveform unit 320 corresponds to one of the recognition results. Units, the corresponding relationship between the two will be. The manner in which the circumference is aligned with the upper and lower positions corresponds to each other. In the present embodiment, each of the waveform units 32 corresponds to a word unit 42 of a recognition result. From Figure 4, we can see that the user's voice signal of "How is the weather today" is the result of the text "How is the weather today"; and the input voice signal corresponding to the waveform unit may be the result of the identification. Word 420 of "Today". The positions of the two are aligned up and down, and the same color represents their recognition quality. The speech recognition __ is a speech physics (four), the 读 finger reading surface 3 〇 is still as shown in FIG. 4 and the text output interface 40 financial output speech understanding result; and the 'text output interface 40 can still include the speech recognition result text Or hide it first, and then make it clear after the selection is displayed. Referring to Figure 4, after the speech recognition is completed, the signal and text display are aligned paragraphs in units of words such as "Today", "Weather" and "How", in which each word unit is The color is displayed to represent the quality of the speech recognition result of the unit. In this real off towel, each -_ unit is thin green, yellow or red. · 'Green label does not have good speech recognition quality; yellow 200921641 warning far text has bad voice touch quality; red _ stands for the text The quality of speech recognition with hate is better, and it is better to view and correct the results of speech recognition. Dreaming this, it is easy to make money from the visual B can be divided into _ each to solve the voice recognition quality of the good and bad is to carry out appropriate (10) false madness and feedback purely. In addition, each of the above-described waveform units 320 is coupled to a feedback adjustment option group; the option group includes at least one option for the side to listen to the recording, correct the recording, correct the voice compensation error, or adjust the voice recognition system. As shown in FIG. 5, each waveform unit 320 is coupled to the -1st feed adjustment option group %, and the first feedback adjustment item group 50 includes at least one feedback adjustment option & The first feedback adjustment option group 50 includes feedback adjustment options such as "play", "re-record", "nano (10) practice", "change to handwriting input", and "change to keyboard wheeling". After the speech recognition is completed, the user can move the mouse cursor 22 displayed on the display device % to the waveform unit 32 of the desired feedback adjustment, or automatically click by the mouse or the stylus. Then, the first feedback adjustment option group 50 connected to the waveform unit 32() can be displayed on the device 2G. Thereby, it is possible to select the desired feedback adjustment option 52' to correct the speech recognition result, or to feedback the speech recognition system. For example, when the user finds that the waveform is different, the "Play" option 52 can be selected to play the voice audio to determine whether there is noise interference; or when the voice recognition text has a considerable degree of deviation, the village uses The "touch" option a listens to the previously input voice signal 'to find out the cause, such as the deviation of the pronunciation. If there is a problem with 12 200921641, you can select the “Re-record” option 52 to re-enter the voice signal. If the deviation of the speech output of the speech recognition is due to its own pronunciation habits, "the "None Training" option 52 can also be selected to sharpen the speech recognition secret to meet the needs of the user. _ Adjust to be able to clearly identify the same, the user can decide to change the input mode, for example, select "Change to handwriting wheel" or "Change keyboard input" to switch the voice input mode to handwriting or keyboard wheeling mode. To complete the purpose of the input. Each word unit 42 in the above text display result is also linked with the feedback adjustment option group 2, and the it option group includes at least one _ adjustment option for the user to take to correct The error of the voice touch, or the whole voice recognition system. As shown in Fig. 6, each of the words 42G is linked with a second feedback adjustment option group, and the first feedback adjustment option group 6Q includes at least - In the case of 7 cases, the second option group 6 includes "listing the next speech recognition candidate", and "listening the speech recognition according to the voice." Waiting for the word, "listing the candidate words according to the words", "listing all the candidates", "changing the handwriting input" and "changing the keyboard wheel" to the material feeding adjustment option 62. After the speech recognition Money, the user can move the mouse cursor 22 to the desired solution, or automatically click through the gas or stylus to select the second feedback adjustment that will be subtracted from the word unit. The option group 60 is displayed on the display device 2, whereby the user can adjust the speech recognition result by using the feedback adjustment option 62'. Due to the pronunciation problem, according to the voice input by the user. The voice obtained by the signal 13 200921641 The result of the recognition of the text may be quite different. Taking Figure 6 as an example, the language spoken by the user is "I want to eat". As the pronunciation habits are different, the result of the identification may be different. According to the voice signal input by the user, the visual feedback system of the present invention provides the approximation result of the singularity of the singularity of the singer from the voice of the singer for the selection. The approximate result can be selected by the second time. Adjust the different feedback adjustment options 62 in the option group 60 to determine. For example, by using the option 62 of "down-axis selection", the user can obtain the candidate word of the next-pen; By giving priority to the 62, the user can obtain the most approximate result in the voice; by selecting the option 62 of the word connection priority, the user can find the most likely candidate according to the connected relationship between the words and the words; Or by selecting option 62 of "list all candidate words", the user can list all the speech recognition candidates. Or the producer can also take care of other input modes, such as selecting a handwriting input. Or "change keyboard input", thereby switching the voice input mode to the handwriting or keyboard input mode to complete the purpose of input. Therefore, the present invention has the following advantages: 1. The waveform used to represent the user voice signal provided by the present invention The voice recognition system allows the user to instantly determine whether the recording process is successful and the quality of the entered language. 2. With the speech recognition system of the varying waveform color provided by the present invention, the user can conveniently monitor the progress of the speech processing and the quality of the speech recognition result. 3. With the voice recognition system provided by the present invention, the user can perform the error correction or the system feedback adjustment for the input voice signal and the voice recognition result text by the unit 14 200921641 to facilitate the text input work. Or continuously improve the performance of the speech recognition system. In summary, this creation can indeed achieve the intended purpose, providing a voice recognition system that is convenient for the user to monitor the sound process, the quality of the voice signal, the progress of the voice touch process, and the product of the result. It is convenient for the user to correct the error of the voice touch, or _ voice recognition, to promote the voice recognition. Its extremely _ value, New Zealand filed a patent application. Further, the above-mentioned outlines and drawings are only descriptions of the embodiments of the present invention, and those skilled in the art can still make equivalent local changes and modifications without resorting to the spirit and spirit. A simple illustration of the L diagram is a schematic diagram of the speech recognition system of the present invention. Fig. 2 is a diagram of a speech recognition system of the present invention. Fig. 3 is a first embodiment of the speech recognition system of the present invention: Fig. 1 is a further embodiment of the speech recognition system of the present invention. Fig. 5 is a diagram showing the use state of the VI 6 of the speech recognition system of the present invention. Figure 6 is a diagram of the voice recognition element of the present invention. The use state diagram 15 200921641 [Description of main component symbols] Speech recognition engine 10 Character 42 Display device 20 Word unit 420 Mouse cursor 22 First feedback adjustment option group 50 Signal indication interface 30 First feedback adjustment option 52 Waveforms 32, 321, 322 Second feedback adjustment option group 60 f Waveform unit 320 Text output interface 40 Second feedback adjustment option 62 16

Claims (1)

200921641 十、申請專利範圍: 種⑺曰辨識系統,包括至少一個語音辨識引擎以及一 顯示 裝置,且於該顯示裝置上設有: 。、,财U&7F介面,储由波形代表使用者輸人之語音訊 號X顯不錄曰狀態、語音辨識進行令狀態以及語音辨識完 成狀態;以及 …文字輸出面,用於顯示語音辨識之文字結果,且該 文子結果包括至少—個詞單元。 2、 如申請專利_1項所述之語音辨識系統,其中訊號指示介 2上觸示之錄音狀態、語音辨識進行中狀態以及語音辨識 兀成狀恶之波形分別以不同顏色顯示。 3、 如申請專利細第1項所述之語音辨識系統,其中文字輪出介 辨識之文字結果的每-個詞單元,係分別以不同顏 厂每—個岡單元之語音辨識品質。 如申印專利乾圍第3項所述之語音辨識系、统,其中每一個詞單 =,:黃色或紅色顯示:其中物示有良好之語音 且⑽貝,更色代表警告有不良之語音辨識品質;紅色則代 5 2、魏叙語音辨要嚴格檢視並更正之。 、,糊範圍第3項所述之語音辨識系統,射每一個詞單 ==饋調整選項組相連結;該選項組包含至少一個回饋 ^、’供使用者選取以更正語音辨識的錯誤,或回饋調 17 200921641 整語音辨識系統。 6如申凊專利範圍第5項所述之語音辨識系統,其中使用者係藉 由將顯示於該顯示裝置上之一滑鼠游標移動至欲回饋調整之 5單元上u觸控筆或滑鼠按麼點擊,而將該回饋調整 選項組顯示於該顯示装置上。 7、 如申請專利範圍第5項所述之語音辨識系、统,其中與該詞單元 相連結之_調整選項組所包含之回綱整選項絲『列出 下-筆候選詞』、『依據語音相似度優先列出語音辨識候選 詞』、『依據詞相連性優先列出候選詞』、『列出全部近似辨識 結果』、『改為手寫輸入』、『改為鍵盤輸入』或以上之任意組 合。 、 8、 如申請專利範圍第3或4項所述之語音辨識系統,其中訊號指 不介面上語音辨識完成狀態之波形更包括至少—個波形單 元’每-細彡單元係與文字輸出介面所齡之語音辨識結果 之-個詞單元相對應’且彼此係以相互贿的方式排列,並 以相同的顏色來表示該詞單元之語音辨識品質。 9、如申請專概gj第s項所述之語音辨識系統,射訊號指示介 面上的波形單元係與回饋調整選項組相連結;該選項組包人 至少一個回饋調整選項,供使用者重聽錄音,更正錄音,^ 正語音辨識的錯誤,或回饋調整語音辨識系統。 10'如申請專利範圍第9項所述之語音辨識系統,其中使用者係 18 200921641 藉由將顯示於該顯示裝置上之一滑鼠游標移動至所欲回饋調200921641 X. Patent application scope: The (7) identification system includes at least one speech recognition engine and a display device, and is provided on the display device. , U&7F interface, stored by the waveform representing the user's input voice signal X display status, voice recognition status and voice recognition completion status; and ... text output surface for displaying the text of speech recognition As a result, and the result of the text includes at least one word unit. 2. The speech recognition system described in Patent Application _1, wherein the recording state, the speech recognition in progress state, and the speech recognition waveform on the signal indication interface are respectively displayed in different colors. 3. The speech recognition system described in the first application of the patent item 1, wherein each word unit of the text result recognized by the text wheel is distinguished by the voice recognition quality of each of the different units. For example, the speech recognition system and system described in the third paragraph of the patent application, each of the words list =, yellow or red display: wherein the object shows good speech and (10) shell, the color indicates the warning of bad speech. Identification quality; red generation 5, Wei Xu voice recognition should be strictly examined and corrected. , the speech recognition system described in item 3 of the paste range, each word list == feed adjustment option group is connected; the option group includes at least one feedback ^, 'for the user to select to correct the speech recognition error, or Feedback 17 200921641 Complete speech recognition system. 6. The speech recognition system of claim 5, wherein the user moves the mouse cursor displayed on the display device to the unit 5 to be fed back to adjust the u stylus or mouse Click to click, and the feedback adjustment option group is displayed on the display device. 7. The speech recognition system and system according to item 5 of the patent application scope, wherein the adjustment option group included in the _ adjustment option group is 『listed under the pen-pen candidate', The speech similarity priority lists the speech recognition candidate words, "list the candidate words according to the word continuity priority", "list all the approximate identification results", "change the handwriting input", "change the keyboard input" or any of the above. combination. 8. The speech recognition system according to claim 3 or 4, wherein the signal refers to a waveform that does not have a speech recognition completion state, and further includes at least one waveform unit 'per-fine unit and text output interface The speech recognition result of the age-word unit corresponds to 'and each other is arranged in a mutual bribe, and the speech recognition quality of the word unit is represented by the same color. 9. If the voice recognition system described in the special gj item s is applied, the waveform unit on the signal indication interface is connected with the feedback adjustment option group; the option group includes at least one feedback adjustment option for the user to listen to. Recording, correcting the recording, ^ correct speech recognition error, or feedback adjustment speech recognition system. 10' The speech recognition system of claim 9, wherein the user system 18 200921641 moves the mouse cursor displayed on the display device to the desired feedback 整之波形單元上,或經由觸控筆或滑鼠按麼點擊,而將該回 饋調整選項組顯示於該顯示裝置上。 11、如申請專利範圍第9項所述之語音辨識系統,其中該回饋調 整選項組所包含之回饋調整選項包括『播放』、『重錄』、『納 入訓練』、『改為手寫輪入』、『改為鍵盤輸入』或以上之任意 組合。 12、 如中請專·圍第丨項所述之語音辨識系統,其系統為具有 ’、員丁#置心連結其他齡裝置,或於遙控器上具有顯示 裝置的桌上型電腦、筆記型電腦、家用多媒體系統、電視、 DVD、影音_、手機或個人數位助理。 13、 二圍第1項所述之語音辨識系統,其中詞單元是 5习、-人㈣、或是片語。 19The feedback adjustment option group is displayed on the display device on the entire waveform unit, or by tapping via a stylus or a mouse. 11. The voice recognition system of claim 9, wherein the feedback adjustment option included in the feedback adjustment option group includes "play", "re-record", "incorporate training", and "change to handwriting" , "Change keyboard input" or any combination of the above. 12. The voice recognition system described in the article, the system is a desktop computer with a display device installed on the remote control, or a desktop computer with a display device. Computer, home multimedia system, TV, DVD, audio and video, mobile phone or personal digital assistant. 13. The speech recognition system described in item 1 of the second paragraph, wherein the word unit is 5, - (4), or a phrase. 19
TW096142713A 2007-11-12 2007-11-12 Speech recognition system TWI351021B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW096142713A TWI351021B (en) 2007-11-12 2007-11-12 Speech recognition system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW096142713A TWI351021B (en) 2007-11-12 2007-11-12 Speech recognition system

Publications (2)

Publication Number Publication Date
TW200921641A true TW200921641A (en) 2009-05-16
TWI351021B TWI351021B (en) 2011-10-21

Family

ID=44727957

Family Applications (1)

Application Number Title Priority Date Filing Date
TW096142713A TWI351021B (en) 2007-11-12 2007-11-12 Speech recognition system

Country Status (1)

Country Link
TW (1) TWI351021B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI426425B (en) * 2009-11-27 2014-02-11 Mitac Int Corp Method of processing touch commands and voice commands in parallel in electronic device supporting speech recognition and electronic device thereof

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI426425B (en) * 2009-11-27 2014-02-11 Mitac Int Corp Method of processing touch commands and voice commands in parallel in electronic device supporting speech recognition and electronic device thereof

Also Published As

Publication number Publication date
TWI351021B (en) 2011-10-21

Similar Documents

Publication Publication Date Title
US11705130B2 (en) Spoken notifications
US11593984B2 (en) Using text for avatar animation
US20210224474A1 (en) Automatic grammar detection and correction
KR102477925B1 (en) Synchronization and task delegation of a digital assistant
US10497365B2 (en) Multi-command single utterance input method
US11756574B2 (en) Multiple state digital assistant for continuous dialog
DE112016003459B4 (en) Speaker recognition
JP5756555B1 (en) Utterance evaluation apparatus, utterance evaluation method, and program
US8983846B2 (en) Information processing apparatus, information processing method, and program for providing feedback on a user request
US20170263249A1 (en) Identification of voice inputs providing credentials
CN110313151A (en) Messaging from shared device
DE202017004558U1 (en) Smart automated assistant
US20090037171A1 (en) Real-time voice transcription system
US20220375466A1 (en) Siri integration with guest voices
US20230058929A1 (en) Digital assistant interaction in a communication session
CN101452700A (en) Voice identification system
WO2018105373A1 (en) Information processing device, information processing method, and information processing system
DE112019000018T5 (en) RAISE TO SPEAK
TW201142686A (en) Electronic apparatus having multi-mode interactive operation method
CN110109730A (en) For providing the equipment, method and graphic user interface of audiovisual feedback
US20220093086A1 (en) Method and a system for capturing conversations
WO2016045468A1 (en) Voice input control method and apparatus, and terminal
TW200921641A (en) Speech recognition system
CN107391015A (en) Control method, device and equipment of intelligent tablet and storage medium
US20230352007A1 (en) Sonic responses

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees