TW200951940A - Correcting device, correcting method and correcting system of speech recognition result - Google Patents

Correcting device, correcting method and correcting system of speech recognition result Download PDF

Info

Publication number
TW200951940A
TW200951940A TW098113352A TW98113352A TW200951940A TW 200951940 A TW200951940 A TW 200951940A TW 098113352 A TW098113352 A TW 098113352A TW 98113352 A TW98113352 A TW 98113352A TW 200951940 A TW200951940 A TW 200951940A
Authority
TW
Taiwan
Prior art keywords
error
vocabulary
identification
unit
interval
Prior art date
Application number
TW098113352A
Other languages
Chinese (zh)
Other versions
TWI427620B (en
Inventor
zhi-peng Zhang
Nobuhiko Naka
Yusuke Nakashima
Original Assignee
Ntt Docomo Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ntt Docomo Inc filed Critical Ntt Docomo Inc
Publication of TW200951940A publication Critical patent/TW200951940A/en
Application granted granted Critical
Publication of TWI427620B publication Critical patent/TWI427620B/en

Links

Landscapes

  • Telephonic Communication Services (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The present invention provides a correcting device and correcting method of speech recognition result capable of correcting erroneous recognition without causing troubles for the user when the recognition result contains errors. Feature data of speech is sent to a server device (120). Then, a recognition process is carried out on the server device (120), and a receiving part (235) receives recognition result from the server device (120). Based on reliability, etc., an error interval identification part (240) identifies, from the received recognition result, the error interval wherein recognition errors occur. Then, a feature extraction part (260) of error interval extracts feature data of the error interval, and a correcting part (270) carries out a re-recognition process on the extracted recognition result of the error interval so as to execute the correction process.

Description

200951940 六、發明說明: 【發明所屬之技術領域】 本發明係有關於將語音辨識過之資料加以訂正的語音 辨識結果訂正裝置及語音辨識結果訂正方法,以及語音辨 識結果訂正系統。 【先前技術】 〇 於行動終端上將所輸入之語音輸出至伺服器,於該當 伺服器上辨識語音,將其辨識結果發送至行動終端,藉此 而可於行動終端上取得語音結果的技術,已如日本特開 2003-295893號公報(專利文獻1)所記載而爲習知。 然而,當伺服器上所被辨識的辨識結果有錯誤時,並 未考慮進行其訂正。一般而言,當辨識結果有錯誤時,是 考慮讓使用者以手動輸入進行操作來進行訂正,但這非常 麻煩。例如’使用者要先了解辨識結果的文章,辨識出錯 〇 誤’指定該有錯誤的地方,然後訂正,這些都很麻煩。 於是,本發明的目的在於提供一種,當辨識結果有錯 誤時’不對使用者造成麻煩就能訂正辨識錯誤的語音辨識 結果訂正裝置及語音辨識結果訂正方法,以及語音辨識結 果訂正系統。 【發明內容】 爲了解決上述課題,本發明的語音辨識結果訂正裝置 ’係具備:輸入手段,係用以輸入語音;和算出手段,係 -5- 200951940 用以基於被前記輸入手段所輸入之語音,而算出特徵量資 料;和記憶手段,係用以記憶被前記算出手段所算出之特 徵量資料;和取得手段’係用以取得對前記輸入手段所輸 入之語音的辨識結果;和指定手段,係用以於前記取得手 段所辨識之辨識結果中,指定出有發生辨識錯誤的錯誤區 間;和訂正手段,係用以從前記記憶手段中所記憶之特徵 量資料,抽出已被前記指定手段所指定之錯誤區間所對應 之特徵量資料,並使用該當已抽出之特徵量資料來進行再 辨識,藉此以執行前記取得手段所得到之辨識結果的訂正 〇 又’本發明的語音辨識結果目了正方法,係具備:輸入 步驟’係用以輸入語音;和算出步驟,係用以基於被前記 輸入步驟所輸入之語音’而算出特徵量資料;和記憶步驟 ’係用以記憶被前記算出步驟所算出之特徵量資料;和取 得步驟,係用以取得對前記輸入步驟所輸入之語音的辨識 結果:和指定步驟’係用以於前記取得步驟所辨識之辨識 結果中,指定出有發生辨識錯誤的錯誤區間;和訂正步驟 ,係用以從前記記憶步驟中所記憶之特徵量資料,抽出已 被前記指定手段所指定之錯誤區間所對應之特徵量資料, 並使用該當已抽出之特徵量資料來進行再辨識,藉此以執 行前記取得步驟所得到之辨識結果的訂正。 若依據本發明,則會將所被輸入的語音的特徵量資料 加以記憶,並在對該語音所辨識的辨識結果中,指定出有 發生辨識錯誤的錯誤區間。然後,藉由將已被指定之錯誤 -6- 200951940 區間中的特徵量資料’進行再辨識,以訂正辨識結果。藉 此,在辨識的結果當中’將有必要的部分進行訂正,可簡 易地進行訂正處理’同時,可獲得正確的辨識結果。藉此 ’就可不對使用者造成負擔,可簡單地進行訂正處理,可 獲得正確的語音辨識結果。 又,於本發明的語音辨識結果訂正裝置中,前記取得 手段’係由送訊手段’係用以將前記輸入手段所輸入之語 q 音,發送至語音辨識裝置;和收訊手段,係用以接收前記 語音辨識裝置上所辨識出來的辨識結果所構成;前記指定 手段’係於前記收訊手段所接收到的辨識結果中,指定出 有發生辨識錯誤的錯誤區間,較爲理想。 若依據此發明,則將所被輸入之語音,發送至語音辨 識裝置,並將該語音辨識裝置上進行辨識後的辨識結果, 予以接收。然後,在所接收到的辨識結果中,指定出有發 生辨識錯誤的錯誤區間,將所被指定之錯誤區間中的辨識 Q 結果,加以訂正。藉此,在辨識的結果當中,將有必要的 部分進行訂正,可簡易地訂正語音辨識之錯誤,可獲得正 確的辨識結果。 又,於本發明的語音辨識結果訂正裝置中,前記指定 手段,係藉由受理使用者操作,以指定錯誤區間,較爲理 想。 若依據本發明,則可藉由受理使用者操作’以指定錯 誤區間,可較簡易地指定錯誤區間,並且可獲得正確的語 音辨識結果。 200951940 又,於本發明的語音辨識結果訂正裝置中,前記指定 手段,係基於前記辨識結果中所被賦予的辨識結果之信賴 度來判斷錯誤區間,並指定該當判斷出來之錯誤區間,較 爲理想。 若依據本發明,則基於辨識結果中所被賦予的辨識結 果之信賴度來判斷錯誤區間,並指定該當判斷出來之錯誤 區間,藉此就可自動地指定錯誤區間,可較簡易地指定錯 誤區間。 又,於本發明的語音辨識結果訂正裝置中,前記指定 手段,係計算前記辨識結果之信賴度,基於該當信賴度來 判斷錯誤區間,並指定該當判斷出來之錯誤區間,較爲理 想。 若依據本發明,則可計算辨識結果之信賴度,基於該 當信賴度來判斷錯誤區間,並指定該當判斷出來之錯誤區 間,而可較簡易地指定錯誤區間。甚至,在使伺服器裝置 等進行語音辨識的情況時,亦可設計成從該伺服器裝置來 就不計算信賴度,可提供更便於使用的裝置。 又,本發明的語音辨識結果訂正裝置,係更具備:特 定手段,係用以特定,被前記指定手段所指定之錯誤區間 的前方的至少一個字彙、或是後方的至少一個字彙、或是 前記前方字彙及後方字彙之雙方之任一者加以形成的辨識 結果;前記訂正手段,係將已被前記特定手段所特定之辨 識結果,視爲拘束條件,依照該拘束條件,將錯誤區間之 前方字彙、後方字彙加以包含之區間所對應的特徵量資料 -8- 200951940 ’從前記記憶手段中予以抽出,對已抽出之特徵量資料, 進行辨識處理,較爲理想。 若依據本發明,則將已被指定之錯誤區間的前方的至 少一個字彙、或是後方的至少一個字彙、或是前記前方字 彙及後方字彙之雙方之任一者加以形成的辨識結果,加以 特定,將已被特定之辨識結果視爲拘束條件,依照該拘束 條件’來進行預先記憶之特徵量資料的辨識處理。藉此, 0 就進行較正確的辨識處理,因此可獲得正確的語音辨識結 果。 又’本發明的語音辨識結果訂正裝置,係更具備:特 定手段’係用以特定,被前記指定手段所指定之錯誤區間 的前方的至少一個字彙、或是後方的至少一個字彙、或是 前記前方字彙及後方字彙之雙方之任一者加以形成的辨識 結果;前記訂正手段,係將已被前記特定手段所特定之辨 識結果’視爲拘束條件,依照該拘束條件,將錯誤區間所 Q 對應的特徵量資料,從前記記憶手段中予以抽出,對已抽 出之特徵量資料,進行辨識處理,較爲理想。 若依據本發明,則將已被指定之錯誤區間的前方的至 少一個字彙、或是後方的至少一個字彙、或是前記前方字 彙及後方字彙之雙方之任一者加以形成的辨識結果,加以 特定’將已被特定之辨識結果視爲拘束條件,依照該拘束 條件’來進行預先記憶之特徵量資料的辨識處理。亦即, 在本發明中’是可僅使用錯誤區間的特徵量資料,來進行 辨識處理。藉此,就進行較正確的辨識處理,因此可獲得 -9- 200951940 正確的語音辨識結果。 又,本發明的語音辨識結果訂正裝置,係更具備:字 彙資訊特定手段,係用以特定:將被前記指定手段所指定 之錯誤區間的前方的至少一個字彙予以特定所需之資訊亦 即字彙資訊、或是後方的至少一個字彙的字彙資訊、或是 前記前方字彙的字彙資訊及後方字彙的字彙資訊之雙方之 任一者加以形成的辨識結果中之字彙的字彙資訊;前記訂 正手段,係將已被前記字彙資訊特定手段所特定之字彙資 訊,視爲拘束條件,依照該拘束條件,將錯誤區間之前方 字彙、後方字彙加以包含之區間所對應的特徵量資料,從 前記記憶手段中予以抽出’對已抽出之特徵量資料,進行 辨識處理,較爲理想。 若依據本發明,則可將用來特定出字彙用的字彙資訊 當作拘束條件,來進行訂正處理,藉此可進行較正確的辨 識處理。 例如,作爲字彙資訊,係含有:表示字彙之詞性的詞 性資訊、及表示字彙之念法的讀音資訊,之任1者或複數 者,較爲理想。 又,本發明的語音辨識結果訂正裝置,係更具備:未 知詞判定手段,係基於前記字彙資訊來判定,被前記指定 手段所指定之錯誤區間的前方的至少一個字彙、或是後方 的至少一個字彙、或是前記前方字彙及後方字彙之雙方之 任一者加以形成的辨識結果的字彙,是否爲未知詞:若藉 由前記未知詞判定手段而判定了前記辨識結果的字棄是未 -10- 200951940 知詞’則前記訂正手段係以前記字彙資訊爲基礎,來進行 辨識結果的訂正處理,較爲理想。 若依據本發明,則當係未知詞時,則藉由將字彙資訊 當成拘束條件來進行辨識處理,就可獲得較正確的語音辨 識結果。 又,本發明的語音辨識結果訂正裝置,係更具備:連 接機率記憶手段,係用以記憶字彙彼此的連接機率;前記 U 訂正手段,係根據訂正處理已進行過之事實,而作成該當 錯誤區間之字彙及與其前後或其中一方之字彙的連接機率 ,使用該當連接機率來更新前記連接機率記憶手段中所記 憶的連接機率,較爲理想。 若依據本發明,則會將字棄彼此的連接機率予以記憶 ,每次將其作訂正處理時,連接機率就會改變,因此藉由 計算該連接機率而進行更新,就可獲得較正確的語音辨識 結果。 φ 又,本發明的語音辨識結果訂正裝置,係更具備:拘 束條件記憶手段,係用以將前記字彙資訊特定手段所特定 出來的字彙資訊或前記特定手段所特定出來的字彙,當作 拘束條件而加以記憶;前記訂正手段,係依照前記拘束條 件記憶手段中所記憶之拘束條件,來進行訂正處理,較爲 理想。 藉此,會將作爲拘束條件的字彙或字彙資訊加以記憶 ,可因應需要而依照所記憶的拘束條件來進行訂正處理, 不必每次進行訂正處理就生成拘束條件,可進行迅速的訂 -11 - 200951940 正處理(語音辨識.處理)。 又,本發明的語音辨識結果訂正裝置,係更具備:受 理手段,係用以從使用者受理文字資訊;前記訂正手段, 係將前記受理手段所受理到的文字資訊,視爲拘束條件, 來進行錯誤區間中的辨識結果的訂正處理,較爲理想。 若依據本發明,則使用者可直接指定用來作爲拘束條 件的文字,可進行較正確的辨識處理,因此可獲得正確的 語音辨識結果。 又,本發明的語音辨識結果訂正裝置,係更具備:時 間資訊算出手段,係用以基於收訊手段所接收到之辨識結 果與前記記憶手段中所記憶之特徵量資料,來算出辨識結 果的經過時間;前記指定手段,係基於前記時間資訊算出 手段所算出之時間資訊,來指定錯誤區間,較爲理想。 若依據本發明,則可基於已被接收到的辨識結果與所 記憶的特徵量資料,來算出辨識結果的經過時間,基於該 時間資訊來指定錯誤區間。藉此,當辨識結果中沒有包含 時間資訊時,也可將錯誤區間所對應之適切的特徵量資料 ,予以抽出。 又,本發明的語音辨識結果訂正裝置’係更具備:顯 示手段,係用以顯示已被前記訂正手段所訂正過的辨識結 果;前記顯示手段,係不顯示前記取得手段所取得之辨識 結果,較爲理想。藉此,由於有辨識錯誤可能性的辨識結 果不會顯示,因此不會對使用者造成誤解。 又,本發明的語音辨識結果訂正裝置,係當前記訂正 -12- 200951940 手段經由再辨識而得到之辨識結果、和前記取得手段所取 得到之辨識結果是相同時,或這些辨識結果分別所含有之 時間資訊是有差異時,則判斷爲辨識錯誤’前記顯示手段 就不顯示辨識結果,較爲理想。藉此,可防止顯示出錯誤 的辨識結果。 又,於本發明的語音辨識結果訂正裝置中,前記指定 手段,係藉由使用者操作而指定錯誤區間之起點,基於前 0 記取得手段所取得到之辨識結果中所被賦予的辨識結果之 信賴度,來指定錯誤區間之終點,較爲理想。藉此,可實 現符合於使用者輸入習慣的訂正方法,可提供便於使用的 裝置。 又,於本發明的語音辨識結果訂正裝置中,前記指定 手段,係藉由使用者操作而指定錯誤區間之起點,根據該 當起點而遠離所定辨識單位數而指定錯誤區間之終點,較 爲理想。藉此,可實現符合於使用者輸入習慣的訂正方法 Q ,可提供便於使用的裝置。 又’於本發明的語音辨識結果訂正裝置中,前記指定 手段’係藉由使用者操作而指定錯誤區間之起點,基於前 記取得手段所取得到之辨識結果中的所定之發音記號,來 指定錯誤區間之終點,較爲理想。藉此,可實現符合於使 用者輸入習慣的訂正方法,可提供便於使用的裝置。 又’於本發明的語音辨識結果訂正裝置中,前記取得 手段’係在取得辨識結果之際,取得複數辨識候補來作爲 辨識結果;前記指定手段,係藉由使用者操作而指定錯誤 -13- 200951940 區間之起點,基於前記取得手段所取得到之辨識候補之數 目,來指定終點,較爲理想。藉此,就可基於辨識結果的 信賴度來指定終點,可實現有效率的訂正處理。 又,於本發明的語音辨識結果訂正裝置中,更具備: 算出手段,係用以算出,已被前記算出手段所算出之特徵 量資料的錯誤區間加以包含之區間的平均値;前記訂正手 段,係將已抽出之特徵量資料,減去前記算出手段所算出 之平均値,將該減算所得之資料,視爲特徵量資料而進行 再辨識處理,較爲理想。藉此,可對已經去除了麥克風等 輸入聲音之收音裝置之特性的聲音,進行訂正處理,可實 現較正確的訂正(語音辨識)。 又,於本發明的語音辨識結果訂正裝置中,具備:輸 入手段,係用以輸入語音;和取得手段,係用以取得對前 記輸入手段所輸入之語音的辨識結果;和指定手段,係用 以於前記取得手段所辨識之辨識結果中,指定出有發生辨 識錯誤的錯誤區間;和通知手段,係藉由將已被前記指定 手段所指定之錯誤區間通知給外部伺服器,以向前記外部 伺服器請求該當錯誤區間的再辨識處理;和收訊手段,係 用以接收,回應於前記通知手段所作之請求而於前記外部 伺服器中所再辨識而成之錯誤區間的辨識結果。 又,於本發明的語音辨識結果訂正方法中,具備:輸 入步驟,係用以輸入語音;和取得步驟,係用以取得對前 記輸入步驟所輸入之語音的辨識結果;和指定步驟,係用 以於前記取得步驟所辨識之辨識結果中,指定出有發生辨 -14- 200951940 識錯誤的錯誤區間;和通知步驟,係藉由將已被前記指定 步驟所指定之錯誤區間通知給外部伺服器,以向前記外部 伺服器請求該當錯誤區間的再辨識處理;和收訊步驟,係 用以接收’回應於前記通知步驟所作之請求而於前記外部 伺服器中所再辨識而成之錯誤區間的辨識結果。 又’本發明的語音辨識結果訂正裝置,係具備:詞根 區間指定手段’係用以於前記取得手段所取得到的辨識結 Q 果中’指定詞根區間;前記訂正手段,係於前記指定手段 所指定之錯誤區間中,再將前記詞根區間指定手段所指定 之詞根區間所對應的特徵量資料,從前記記億手段中抽出 ’使用該當已抽出之特徵量資料來進行再辨識,藉此以執 行前記取得手段所得到之辨識結果的訂正,較爲理想。 藉此,就可使用詞根區間所對應之特徵量資料來執行 辨識結果的訂正,可進行較正確的訂正處理。亦即,可依 照被稱作詞根區間的未知詞之區間來進行再辨識。 ❹ 又,本發明的語音辨識結果訂正裝置,係更具備:分 割手段,係依照前記詞根區間指定手段所指定的詞根區間 ,而將從前記取得手段所取得到的辨識結果,分割成複數 區間 ·, 前記訂正手段,係對前記分割手段所分割出來的每一 分割區間,執行辨識結果的訂正,較爲理想。 藉此,藉由將辨識結果分割成複數區間’就可縮短辨 識對象,可進行較正確的辨識處理。 又,本發明的語音辨識結果訂正裝置中的分割手段, -15- 200951940 係將詞根區間的終點視爲一分割區間的終點,並且將詞根 區間的起點視爲前記一分割區間的下一分割區間的起點, 以此方式來分割辨識結果,較爲理想。 藉此,詞根區間就會被包含在分割區間之任一者。因 此,在辨識處理之際必定會包含詞根區間,藉此就可將詞 根字串視爲拘束條件來進行辨識處理。 又,本發明的語音辨識結果訂正裝置的訂正手段,係 對前記分割手段所分割出來的每一分割區間,執行辨識結 果的訂正,並且將前記詞根區間’視爲各分割區間之訂正 時的拘束條件,較爲理想。 藉此,在辨識處理之際必定會包含詞根區間,因此可 將詞根字串視爲拘束條件來進行辨識處理。 又,於本發明的語音辨識結果訂正裝置中,訂正手段 ,係將前記詞根區間指定手段所指定之詞根區間中所描述 之詞根字串加以含有的假說,當作辨識的探索過程而予以 保持,從該當假說中選擇出最終的辨識結果,以執行訂正 ,較爲理想。 藉此,就可必定使用詞根字串來進行辨識處理。 又,本發明的語音辨識結果訂正裝置’係更具備:字 典追加手段,係用以將前記詞根區間指定手段所指定之詞 根區間中的詞根字串,追加至辨識處理所需之字典資料庫 中,較爲理想。 藉此,就可累積詞根字串,在今後的辨識處理中有效 運用,可進行較正確的辨識處理。 -16- 200951940 本發明的語音辨識結果訂正裝置,係更具備:由使用 者所生成之字典資料庫;前記訂正手段’係使用將詞根字 串依照前記字典資料庫所轉換過的字串’來進行訂正處理 ,較爲理想。 藉此,就可累積詞根字串,在今後的辨識處理中有效 運用,可進行較正確的辨識處理。 又,本發明的語音辨識結果訂正系統,係具備:上述 0 語音辨識結果訂正裝置;和伺服器裝置,係基於從前記語 音辨識結果訂正裝置所發送來的語音而進行語音辨識,並 作成辨識結果而發送至前記語音辨識結果訂正裝置。該語 音辨識結果訂正系統’係僅在於標的之不同而已’在作用 效果上均和上述語音辨識結果訂正裝置相同。 若依據本發明,則可在辨識的結果當中,將有必要的 部分進行訂正’可簡易地進行訂正處理’同時’可獲得正 確的辨識結果。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition result correction device and a speech recognition result correction method for correcting speech-recognized data, and a speech recognition result correction system. [Prior Art] The technique of outputting the input voice to the server on the mobile terminal, identifying the voice on the server, and transmitting the identification result to the mobile terminal, thereby obtaining the voice result on the mobile terminal. It is known as described in Japanese Laid-Open Patent Publication No. 2003-295893 (Patent Document 1). However, when there is an error in the identification result recognized on the server, the correction is not considered. In general, when there is an error in the recognition result, it is considered to allow the user to perform the operation by manual input, but this is very troublesome. For example, the user should first understand the article of the identification result, identify the error error, specify the place where the error occurred, and then correct it. These are very troublesome. Accordingly, an object of the present invention is to provide a speech recognition result correcting means and a speech recognition result correcting method for correcting an erroneous error when the recognition result is erroneous, and a speech recognition result correction system. SUMMARY OF THE INVENTION In order to solve the above problems, a speech recognition result correcting apparatus of the present invention includes: an input means for inputting a voice; and a calculating means for a voice input based on a pre-recorded input means -5-200951940 And calculating the feature quantity data; and the means for memory is used to memorize the feature quantity data calculated by the pre-calculation means; and the acquisition means is used to obtain the recognition result of the voice input by the pre-record input means; and the means for specifying, It is used to identify the error interval in which the identification error occurs in the identification result identified by the pre-recording means; and the correcting means is used to extract the feature quantity data memorized from the memory means beforehand, and extract the pre-recorded means Specifying the feature quantity data corresponding to the error interval, and using the extracted feature quantity data for re-recognition, thereby correcting the identification result obtained by performing the pre-recording obtaining means, and the speech recognition result of the present invention is The positive method has the following steps: an input step is used to input a voice; and a calculation step is used The feature amount data is calculated based on the voice input by the pre-recording input step; and the memory step is for memorizing the feature amount data calculated by the pre-calculation step; and the obtaining step is for obtaining the input to the pre-recording step The recognition result of the voice: and the specified step 'is used to identify the error interval in which the recognition error occurs in the identification result identified by the pre-acquisition acquisition step; and the correction step is used to record the feature quantity memorized from the previous memory step The data is extracted from the feature quantity data corresponding to the error interval specified by the pre-recording means, and the feature quantity data that has been extracted is used for re-identification, thereby performing correction of the identification result obtained by the pre-acquisition obtaining step. According to the present invention, the feature quantity data of the input voice is memorized, and an error section in which the recognition error occurs is specified in the recognition result recognized by the voice. Then, the identification result is corrected by re-identifying the feature amount data in the interval of the designated error -6-200951940. By this, in the result of the identification, 'the necessary part is corrected, the correction process can be easily performed', and the correct identification result can be obtained. By doing so, the user can be burdened, and the correction process can be easily performed to obtain a correct speech recognition result. Further, in the speech recognition result correcting apparatus of the present invention, the pre-recording obtaining means 'transmission means' is for transmitting the speech q input from the pre-recording means to the speech recognition means; and the means for receiving the message It is preferably formed by receiving the identification result recognized by the pre-recording speech recognition device; the pre-recording specifying means is preferably an error section in which the identification error is generated in the identification result received by the pre-recording means. According to the invention, the input voice is transmitted to the voice recognition device, and the recognized recognition result on the voice recognition device is received. Then, in the received identification result, an error section in which an identification error occurs is specified, and the result of the identification Q in the specified error section is corrected. In this way, among the results of the identification, the necessary parts are corrected, and the error of the speech recognition can be easily corrected, and the correct identification result can be obtained. Further, in the speech recognition result correcting apparatus of the present invention, it is preferable that the pre-recording specifying means specifies the error section by accepting the user's operation. According to the present invention, the error section can be specified relatively easily by accepting the user's operation to specify the error section, and the correct voice recognition result can be obtained. 200951940 Further, in the speech recognition result correction device of the present invention, the pre-recording specifying means determines the error interval based on the reliability of the identification result given in the pre-recording result, and specifies the error interval to be determined. . According to the present invention, the error interval is determined based on the reliability of the identification result given in the identification result, and the error interval determined by the determination is specified, whereby the error interval can be automatically designated, and the error interval can be specified relatively easily. . Further, in the speech recognition result correcting apparatus of the present invention, the pre-recording specifying means calculates the reliability of the pre-recording recognition result, determines the error section based on the reliability, and specifies the error section to be determined, which is preferable. According to the present invention, the reliability of the identification result can be calculated, the error interval can be judged based on the reliability, and the error area determined by the determination can be specified, and the error interval can be specified relatively easily. Even in the case of performing voice recognition on the server device or the like, it is also possible to design such that the reliability is not calculated from the server device, and a device that is more convenient to use can be provided. Further, the speech recognition result correction device of the present invention further includes: a specific means for specifying at least one vocabulary in front of the error section designated by the pre-recording means, or at least one vocabulary in the rear, or a pre-record The identification result formed by either the front vocabulary and the rear vocabulary; the pre-editing means regards the identification result specified by the specific means of the pre-recording as the constraint condition, and according to the constraint condition, the error interval is preceded by the square vocabulary The feature quantity data corresponding to the section included in the rear vocabulary is -8-200951940. It is ideal to extract the feature quantity data that has been extracted from the previous memory means. According to the present invention, the identification result of at least one vocabulary in front of the designated error section, or at least one vocabulary in the rear, or either of the front vocabulary and the vocabulary of the front is specified. The specific identification result is regarded as a constraint condition, and the pre-memorized feature quantity data identification processing is performed according to the constraint condition. In this way, 0 performs a more accurate identification process, so that a correct speech recognition result can be obtained. Further, the speech recognition result correction device of the present invention further includes: the specific means 'specifically, at least one vocabulary in front of the error section specified by the pre-recording means, or at least one vocabulary in the rear, or a pre-record The identification result formed by either of the front vocabulary and the rear vocabulary; the pre-editing means is to treat the identification result specified by the pre-recorded specific means as a constraint condition, and according to the constraint condition, the error interval Q corresponds The characteristic quantity data is extracted from the pre-recorded memory means, and the extracted feature quantity data is identified and processed, which is ideal. According to the present invention, the identification result of at least one vocabulary in front of the designated error section, or at least one vocabulary in the rear, or either of the front vocabulary and the vocabulary of the front is specified. The identification process of the feature amount data that has been memorized in advance is performed by considering the specific identification result as a constraint condition and according to the constraint condition. That is, in the present invention, it is possible to perform the identification processing using only the feature amount data of the error section. In this way, a more accurate identification process is performed, so that the correct speech recognition result of -9-200951940 can be obtained. Further, the speech recognition result correcting apparatus of the present invention further includes: a vocabulary information specifying means for specifying that at least one vocabulary in front of the error section specified by the pre-recording means specifies the desired information, that is, the vocabulary The vocabulary information of the vocabulary in the identification result formed by the information, or the vocabulary information of at least one vocabulary at the back, or the vocabulary information of the preceding vocabulary and the vocabulary information of the vocabulary of the front vocabulary; The vocabulary information specified by the specific means of the vocabulary information is regarded as a constraint condition. According to the constraint condition, the feature quantity data corresponding to the interval included in the vocabulary and the vocabulary of the error interval is used in the memory method. It is ideal to extract the 'features of the extracted feature data and perform identification processing. According to the present invention, the vocabulary information for specifying a vocabulary can be used as a constraint condition to perform a correction process, whereby a relatively accurate recognition process can be performed. For example, as the vocabulary information, it is preferable that the vocabulary information indicating the part of the vocabulary and the pronunciation information indicating the vocabulary of the vocabulary are ideal. Further, the speech recognition result correction device of the present invention further includes: an unknown word determination means for determining at least one vocabulary in front of the error section specified by the pre-recording means or at least one of the rear based on the pre-written vocabulary information Whether the vocabulary of the recognition result formed by either the vocabulary or the front vocabulary and the vocabulary of the front is an unknown word: if the pre-recording result is determined by the pre-recorded unknown word determination means, the word discard is not -10 - 200951940 The term "prescription" is based on the previous vocabulary information, which is ideal for correcting the identification results. According to the present invention, when an unknown word is used, the recognition process is performed by using the vocabulary information as a constraint condition, and a more accurate speech recognition result can be obtained. Moreover, the speech recognition result correcting apparatus of the present invention further comprises: a connection probability memory means for storing the connection probability of the vocabulary each other; and a pre-recording U correction means for creating the error interval according to the fact that the correction processing has been performed. It is preferable to use the connection probability to update the connection probability stored in the pre-recorded probability memory means by using the probability of connection between the vocabulary and the vocabulary of one or both of them. According to the present invention, the probability of the connection of the words is discarded, and the connection probability is changed each time the correction process is performed. Therefore, by calculating the connection probability and updating, a more accurate voice can be obtained. Identify the results. φ Further, the speech recognition result correction device of the present invention further comprises: a constraint condition memory means for using the vocabulary information specified by the pre-recorded slogan information specific means or the vocabulary specified by the pre-recording specific means as a constraint condition. And to memorize; the pre-editing method is based on the restraint conditions memorized in the pre-recorded conditional memory means, and it is ideal. In this way, the vocabulary or vocabulary information as a constraint condition can be memorized, and the correction process can be performed according to the restrained condition of the memory as needed. It is not necessary to generate the constraint condition every time the correction process is performed, and the order can be quickly set. 200951940 Processing (voice recognition. Processing). Further, the speech recognition result correction device of the present invention further includes: an accepting means for accepting text information from the user; and a pre-recording correcting means for treating the text information received by the pre-recording receiving means as a constraint condition. It is preferable to perform correction processing of the identification result in the error section. According to the present invention, the user can directly specify the text used as the restraining condition, and the correct identification processing can be performed, so that the correct speech recognition result can be obtained. Moreover, the speech recognition result correction device of the present invention further includes: a time information calculation means for calculating the identification result based on the identification result received by the receiving means and the feature quantity data memorized in the pre-memory means The elapsed time; the pre-recording means is based on the time information calculated by the pre-recording time information calculation means to specify the error interval, which is preferable. According to the present invention, the elapsed time of the identification result can be calculated based on the received recognition result and the stored feature amount data, and the error section can be specified based on the time information. Therefore, when the time information is not included in the identification result, the appropriate feature quantity data corresponding to the error interval can also be extracted. Moreover, the speech recognition result correcting device of the present invention further includes: a display means for displaying the identification result that has been corrected by the pre-recording correction means; and the pre-recording means means not displaying the identification result obtained by the pre-recording obtaining means, More ideal. Thereby, since the recognition result of the possibility of recognizing the error is not displayed, the user is not misunderstood. Moreover, the speech recognition result correcting apparatus of the present invention is the same as the identification result obtained by the re-identification of the current recording positive -12-200951940 means, and the identification result obtained by the pre-recording obtaining means is the same, or the identification results respectively contain If there is a difference in the time information, it is judged that the recognition error is not displayed. Thereby, it is possible to prevent an erroneous recognition result from being displayed. Further, in the speech recognition result correcting apparatus of the present invention, the pre-recording specifying means specifies the starting point of the error section by the user operation, and the identification result given in the identification result obtained based on the first-order acquisition means is Reliability, to specify the end of the error interval, is ideal. Thereby, a correction method conforming to the user's input habit can be realized, and an easy-to-use device can be provided. Further, in the speech recognition result correcting apparatus of the present invention, the pre-recording specifying means specifies the starting point of the error section by the user's operation, and specifies the end point of the error section from the predetermined number of identification units based on the starting point, which is preferable. Thereby, a correction method Q conforming to the user's input habit can be realized, and a device that is easy to use can be provided. Further, in the speech recognition result correcting apparatus of the present invention, the pre-recording specifying means ' specifies the starting point of the error section by the user operation, and specifies the error based on the predetermined pronunciation symbol in the identification result obtained by the pre-recording obtaining means. The end of the interval is ideal. Thereby, a correction method conforming to the user's input habit can be realized, and an apparatus that is easy to use can be provided. Further, in the speech recognition result correcting apparatus of the present invention, the pre-recording obtaining means "obtains a complex identification candidate as a recognition result when obtaining the identification result; the pre-recording specifying means specifies the error by the user operation - 13 - 200951940 The starting point of the interval, based on the number of identification candidates obtained by the pre-recording means, is the ideal end point. Thereby, the end point can be specified based on the reliability of the identification result, and an efficient correction process can be realized. Further, the speech recognition result correction device of the present invention further includes: a calculation means for calculating an average value of a section including an error section of the feature amount data calculated by the pre-calculation means; It is preferable to subtract the average 値 calculated by the pre-calculation means from the extracted feature quantity data, and to use the subtracted data as the feature quantity data for re-identification processing. Thereby, the sound of the characteristics of the sound pickup device that has removed the input sound such as a microphone can be corrected, and a correct correction (voice recognition) can be realized. Further, the speech recognition result correction device of the present invention includes: an input means for inputting a voice; and an acquisition means for obtaining a recognition result of the voice input by the pre-record input means; and a specifying means for In the identification result identified by the pre-acquisition obtaining means, an error section in which a recognition error occurs is specified; and the notification means is to notify the external server by notifying the error section specified by the pre-recording means to the external server. The server requests the re-identification process of the error interval; and the receiving means is configured to receive the identification result of the error interval re-identified in the external server in response to the request made by the pre-notification means. Moreover, in the speech recognition result correction method of the present invention, the method includes: an input step for inputting a voice; and an obtaining step of obtaining an identification result of the voice input to the pre-recording input step; and a specifying step, In the identification result identified by the pre-acquisition obtaining step, an error interval in which the identification error occurs is specified; and the notifying step is to notify the external server by the error interval specified by the pre-recording step. And recognizing the external server to request the re-identification process of the error interval; and the receiving step is for receiving the error interval re-identified in the external server by the request in response to the request in the pre-notification step Identify the results. Further, the speech recognition result correcting apparatus of the present invention is characterized in that: the root interval specifying means is used to specify a root segment in the recognition result obtained by the pre-recording means; the pre-correcting means is based on the pre-recording means In the specified error interval, the feature quantity data corresponding to the root interval specified by the preceding root interval specifying means is extracted from the previous record of the billion means to use the extracted feature quantity data for re-identification, thereby performing It is desirable to correct the identification results obtained by the means of obtaining the foregoing. Thereby, the correction of the identification result can be performed using the feature quantity data corresponding to the root interval, and a relatively correct correction process can be performed. That is, re-identification can be performed in accordance with an interval of an unknown word called a root interval. Further, the speech recognition result correction device of the present invention further includes: a segmentation means for dividing the recognition result obtained from the pre-recording means into a plurality of sections according to the root section specified by the preceding root section specifying means. The pre-script correction method is ideal for performing the correction of the identification result for each segmentation interval divided by the pre-recording means. Thereby, the identification object can be shortened by dividing the identification result into the plural section, and a relatively accurate identification process can be performed. Further, the segmentation means in the speech recognition result correcting device of the present invention, -15-200951940 regards the end point of the root interval as the end point of a divided interval, and regards the starting point of the root interval as the next divided interval of the preceding divided interval. The starting point, in this way to segment the identification results, is ideal. Thereby, the root interval is included in any of the divided intervals. Therefore, the root interval must be included in the identification process, whereby the root string can be regarded as a constraint condition for identification processing. Further, the correction means of the speech recognition result correcting means of the present invention performs correction of the recognition result for each divided section divided by the pre-recording means, and regards the preceding root interval "as a constraint on the correction of each divided section". Conditions are ideal. Therefore, the root interval must be included in the identification process, so the root string can be regarded as a constraint condition for identification processing. Further, in the speech recognition result correcting apparatus of the present invention, the correcting means holds the hypothesis contained in the root string described in the root section specified by the preceding root section specifying means as the identification search process. It is desirable to select the final identification result from the hypothesis to perform the correction. Thereby, the root string can be used for the identification process. Further, the speech recognition result correcting apparatus of the present invention further includes: a dictionary adding means for adding the root string in the root section specified by the preceding root section specifying means to the dictionary database required for the identification processing , more ideal. Thereby, the root string can be accumulated and used effectively in the future identification processing, and a more accurate identification processing can be performed. -16- 200951940 The speech recognition result correction device of the present invention further comprises: a dictionary database generated by the user; the pre-correction means 'uses the string converted by the root string according to the pre-dictionary dictionary database' It is ideal to carry out the correction process. Thereby, the root string can be accumulated and used effectively in the future identification processing, and a more accurate identification processing can be performed. Further, the speech recognition result correction system of the present invention includes: the above-described 0 speech recognition result correction device; and the server device performs speech recognition based on the speech transmitted from the pre-recorded speech recognition result correction device, and creates a recognition result. And sent to the pre-recorded speech recognition result correction device. The speech recognition result correction system 'is only the difference of the subject matter' and the effect is the same as the above-described speech recognition result correcting means. According to the present invention, it is possible to correct the necessary portions among the identification results, and the correction processing can be easily performed while obtaining the correct identification result.

G 【實施方式】 參照添附圖面,說明本發明的實施形態。在可能的情 況下,同一部分係標不同一符號,並省略說明。 <第1實施形態> 圖1係本實施形態的語音辨識結果訂正裝置亦即客戶 端裝置110,及將從客戶端裝置110所發送來的語音加以 辨識,將其結果回送至客戶端裝置110的伺服器裝置120 -17- 200951940 ,具備該兩者的通訊系統的系統構成圖。在本實施形態中 ,客戶端裝置110係例如爲行動電話等之行動終端,可將 使用者所發聲的語音加以輸入,將所輸入之語音,使用無 線通訊而發送至伺服器裝置120,並可接收來自伺服器裝 置120之回訊亦即辨識結果。 伺服器裝置120,係具備語音辨識部,會將所被輸入 的語音,使用音響模型、言語模型等之資料庫來進行語音 辨識,並將其辨識結果回送至客戶端裝置110。 接著,說明該客戶端裝置110的構成。圖2係客戶端 裝置110之機能的區塊圖。該客戶端裝置110,係含有: 特徵量算出部2 1 0 (輸入手段、算出手段)、特徵量壓縮 部220、送訊部225 (取得手段、送訊手段)、特徵量保 存部230 (記憶手段)、收訊部23 5 (取得手段、收訊手 段)、錯誤區間指定部240 (指定手段)、錯誤區間前後 文脈指定部250(特定手段)、錯誤區間特徵量抽出部 2 60、訂正部270 (訂正手段)、音響模型保持部281、言 語模型保持部282、字典保持部283、統合部280、顯示部 290所構成。 圖3係客戶端裝置n〇的硬體構成圖。圖2所示的客 戶端裝置110’實體上而言,係如圖3所示,是以含有: cpuii、屬於主記憶裝置的RAM12及R0M13、屬於輸入 裝置的鍵盤及滑鼠等之輸入裝置14、顯示器等之輸出裝置 15、網路卡等屬於資料收送訊裝置的通訊模組16、硬碟等 之輔助記憶裝置17等的電腦系統之方式而被構成。於圖2 -18- 200951940 中所說明的各機能,係藉由將所定之電腦軟體讀入至圖3 所示的CPU11、RAM12等硬體上,以在CPU11的控制下 ,促使輸入裝置14、輸出裝置15、通訊模組16作動,並 且進行RAM 12或輔助記憶裝置17中的資料之讀出及寫入 ,藉此而加以實現。以下,基於圖2所示的機能區塊’來 說明各機能區塊。 特徵量算出部210,係將從麥克風(未圖示)所輸入 Q 的使用者的聲音,加以輸入,根據該當輸入的聲音,算出 語音辨識頻譜、亦即表示音響特徵的特徵量資料用的部分 。例如,特徵量算出部210係算出,例如 MFCC ( Mel Frequency Cepstrum Coefficient)這類以頻率來表示音響 特徵的特徵量資料。 特徵量壓縮部220,係將特徵量算出部210中所算出 之特徵量資料,予以壓縮用的部分。 送訊部225,係將特徵量壓縮部220中所壓縮過的壓 〇 縮特徵量資料,發送至伺服器裝置120用的部分。該送訊 部 225,係使用 HTTP ( Hyper Text Transfer Protocol)、 MRCP ( Media Resource Control Protocol ) 、 SIP ([Embodiment] An embodiment of the present invention will be described with reference to the accompanying drawings. Where possible, the same part is labeled with a different symbol and the description is omitted. <First Embodiment> Fig. 1 is a client device 110 which is a voice recognition result correction device according to the present embodiment, and recognizes a voice transmitted from the client device 110, and returns the result to the client device. The server device 120 -17-200951940 of 110 has a system configuration diagram of the communication systems of the two. In the present embodiment, the client device 110 is, for example, a mobile terminal such as a mobile phone, and can input a voice uttered by the user, and transmit the input voice to the server device 120 by wireless communication. Receiving the reply from the server device 120 is also the identification result. The server device 120 includes a voice recognition unit that recognizes the input voice using a database such as an acoustic model or a speech model, and sends the identification result back to the client device 110. Next, the configuration of the client device 110 will be described. Figure 2 is a block diagram of the functionality of the client device 110. The client device 110 includes: a feature amount calculation unit 2 1 0 (an input means, a calculation means), a feature amount compression unit 220, a transmission unit 225 (acquisition means, a communication means), and a feature amount storage unit 230 (memory) The means), the receiving unit 23 5 (acquisition means, receiving means), the error section specifying unit 240 (designation means), the error section context specifying section 250 (specific means), the error section feature amount extracting section 2 60, and the correcting section 270 (correction means), acoustic model holding unit 281, speech model holding unit 282, dictionary holding unit 283, integration unit 280, and display unit 290. FIG. 3 is a hardware configuration diagram of the client device n〇. The client device 110' shown in FIG. 2 is physically shown in FIG. 3, and is an input device 14 including: cpuii, RAM 12 and ROM 13 belonging to the main memory device, a keyboard and a mouse belonging to the input device, and the like. The output device 15 such as a display, a network card, and the like are a computer system such as a communication module 16 of a data receiving and transmitting device, and an auxiliary memory device 17 such as a hard disk. Each of the functions described in FIG. 2-18-200951940 is driven by the CPU 11 and the hardware such as the CPU 11 and the RAM 12 shown in FIG. 3 to drive the input device 14 under the control of the CPU 11. The output device 15 and the communication module 16 are activated, and reading and writing of data in the RAM 12 or the auxiliary memory device 17 are performed. Hereinafter, each functional block will be described based on the functional block ' shown in Fig. 2 . The feature amount calculation unit 210 inputs a voice of a user who inputs Q from a microphone (not shown), and calculates a part of the voice recognition spectrum, that is, the feature amount data indicating the acoustic feature, based on the input sound. . For example, the feature amount calculation unit 210 calculates feature amount data indicating acoustic characteristics such as MFCC (Melf Frequency Cepstrum Coefficient). The feature amount compressing unit 220 is a portion for compressing the feature amount data calculated by the feature amount calculating unit 210. The transmitting unit 225 transmits the compressed feature amount data compressed by the feature amount compressing unit 220 to the portion for the server device 120. The transmitting unit 225 uses HTTP (Hyper Text Transfer Protocol), MRCP (Media Resource Control Protocol), and SIP (

Sessionlnitiation Protocol)等,來進行送訊處理。又,在 該伺服器裝置120上,係使用這些協定來進行收訊處理, 或進行回送處理。然後,在該伺服器裝置120上,可將壓 縮特徵量資料予以解壓縮,可使用特徵量資料來進行語音 辨識處理。該特徵量壓縮部220,係用來爲了減輕通訊流 量而進行資料壓縮用的部分,因此該送訊部22 5係也可不 -19- 200951940 進行壓縮而直接將特徵量資料予以發送。 特徵量保存部230,係將特徵量算出部210中所算出 之特徵量資料,予以暫時記憶用的部分。 收訊部23 5,係將從伺服器裝置120所回送的語音辨 識結果加以接收用的部分。該語音辨識結果中係含有文字 資料、時間資訊、及信賴度資訊,時間資訊係表示文字資 料的每一辨識單位的經過時間,信賴度資訊係表示該辨識 結果的正確度用的資訊。 例如,作爲辨識結果’接收了圖4(a)所示的資訊。 在圖4(a)中’雖然有發聲內容、辨識內容、語音區間、 信賴度是被建立對應而記載,但實際上是不含有發聲內容 。此處,在語音區間中所示的數字,係表示框架的索引, 是表示該辨識單位的最初框架的索引。此處,1框架係相 當於10msec程度。又,信賴度係表示於伺服器裝置12〇 上所辨識出來之語音辨識結果的每一辨識單位的信賴度, 是表示正確程度如何的數値。這是對於辨識結果使用機率 等所生成的數値,於伺服器裝置120上,被附加在所被辨 識之字彙單位的數値。例如,作爲信賴度的生成方法,係 記載於以下的參考文獻。 參考文獻:李晃伸、河原達也、鹿野清宏,「2-passs 探索演算法下基於高速字彙事後機率的信賴度算出法」, 資訊處理學會硏究報告,2003-SLP-49-48,2003-1 2。 在圖4(a)中係圖示了,例如,辨識結果的「賣(売 η τ )」(urete),是由33框架至57框架所構成,其信 200951940 賴度係爲0.8 6。 錯誤區間指定部2 4 0,係基於被收訊部2 3 5所接收到 的語音辨識結果,來指定錯誤區間用的部分。該錯誤區間 指定部240,例如,係可基於從伺服器裝置1 20所發送來 的語音辨識結果中所含之信賴度資訊,來指定錯誤區間。 例如,在圖4(a)中,作爲辨識結果係表示了,文字 資料係爲905 ( kyuumarugo ),時間資訊係爲9框架( 0 90msec) ’其信賴度係爲0.59,又,在另一地點,辨識結 果的「哪(if乙)」(doko)的信賴度係爲〇.〇4。然後, 該錯誤區間指定部240,係可把信賴度在所定閾値以下者 ’判斷爲有錯誤,可把該區間指定成爲錯誤區間。例如, 當設定爲信賴度在0.2以下者就爲有誤的情況下乙” (doko) 、 “T” (de)、 “豆腐 ” (doufu)的部分就 判斷爲有誤’可將該部分指定成爲錯誤區間。該閾値係爲 可在客戶端裝置110側預先設定的數値。此外,亦可隨著 Q 語音的個人差異、雜音(雜訊)的量、或信賴度的計算方 法而作可變設定。亦即,當雜音較多時,由於信賴度會更 加降低’因此將閾値設定得較低;又,當對語音辨識結果 所附加的信賴度整體而言均很低時,或反之均很高時,則 亦可隨著其信賴度的高低來作設定。例如,可基於信賴度 的中央値來設定閾値,或亦可基於平均値來設定閾値。圖 4(b)係圖示了中文的發音例子作爲參考。 此外’客戶端裝置11〇’係具備用來計算辨識結果之 信賴度資訊的信賴度計算部(未圖示),錯誤區間指定部 -21 - 200951940 240,係亦可基於在客戶端裝置110內所計算出來的信賴 度資訊,來設定錯誤區間。 錯誤區間前後文脈指定部250,係基於錯誤區間指定 部240上所指定的錯誤區間,來指定該當錯誤區間前後所 被辨識之字彙(至少一辨識單位)用的部分。以下就僅使 用前後1字彙的情況爲例來說明。在圖5(a)中,圖示了 於錯誤區間之前後所被辨識之一辨識單位(錯誤區間前後 文脈)加以指定時的槪念圖。如圖5 ( a )所示,在辨識結 果的錯誤區間之前後,指定錯誤區間前之字彙的語音區間 、錯誤區間後之字彙的語音區間。 錯誤區間特徵量抽出部260,係將已被錯誤區間前後 文脈指定部2 5 0所指定的錯誤區間(亦可包含前後至少一 辨識單位)的特徵量資料,從特徵量保存部230中加以抽 出用的部分。 訂正部270,係將已被錯誤區間特徵量抽出部260所 抽出之特徵量資料,進行再度語音辨識用的部分。該訂正 部2 70,係使用音響模型保持部281、言語模型保持部282 、及字典保持部2 83,來進行語音辨識。然後,該訂正部 270,係將已被錯誤區間前後文脈指定部25 0所指定之前 後的語音區間所示的字彙(前後文脈),視爲拘束條件來 進行語音辨識。圖5 (b)係圖示了,基於已被錯誤區間前 後文脈指定部250所指定之字彙來進行辨識處理時的槪念 圖。如圖5(b)所示,當把錯誤區間的前面區間的字彙 W1與後面區間的字彙W2視爲拘束條件時,辨識候補就 200951940 會變成有限。因此’可提升辨識的精度。在圖5(b)的例 子中,作爲辨識候補可過濾成A〜Z,可從該已被過濾之 候補之中選擇出適切的候補,可有效率地進行辨識處理。 又’訂正部270’係亦可基於與前後字彙的修辭關係 、活用形(字尾變化)等來進行訂正處理。例如,訂正部 270係亦可將對錯誤區間之字彙的辨識候補a〜Z予以複 數抽出’基於其前後字彙W1與W2的修辭之關係,來算 Q 出每一訂正候補的分數’將分數高的訂正候補,視爲辨識 結果。 又’訂正部270係即使當前面區間的字彙W1或後面 區間的字彙W2是未被包含在言語模型保持部282或字典 保持部283中時,仍可將用來特定該字彙用的字彙資訊或 用來特定前後字彙用的字彙資訊視爲拘束條件,來進行訂 正處理(再度語音辨識處理)。 例如’客戶端裝置110,係作爲字彙資訊,將表示字 〇 彙W1、字彙W2各自之詞性用的詞性資訊,從伺服器裝 置120予以接收,訂正部270係將字彙W1、字彙W2各 自之詞性資訊,當成拘束條件而進行訂正處理。藉此,就 可進行較正確的訂正處理,亦即語音辨識處理。具體而言 ’於收訊部235上所接收到的語音辨識結果中所被附加之 字彙資訊當中,錯誤區間指定部240會將錯誤區間的前後 (或是任一方)的字彙資訊予以抽出,輸出至訂正部270 。在訂正部270中,會將該字彙資訊視爲拘束條件而將所 指定之部分進行訂正處理。其槪念圖示於圖24。如圖24 -23- 200951940 所示,對應於字彙W1係有詞性資訊A (例如,助詞 對應於字彙W2係有詞性資訊B (例如,動詞),被 拘束條件而設定。訂正部270,係藉由滿足各個詞性 A及詞性資訊B的方式來進行訂正處理,就可進行較 的語音辨識處理。 此外,作爲字彙資訊,並不限定於詞性資訊,亦 例如念法等字彙以外的用來特定字彙所需之資訊。 又,當必要的字彙資訊未被包含在語音辨識結果 ,則藉由將屬於辨識對象的文章,使用周知的語素解 統(例如“茶筅” 、“ Mecab” )、日本語修辭解析 (例如“南瓜”)等來進行解析,就可生成字彙資訊 即,於圖25中所示的客戶端裝置110的變形例中, 附加有字彙資訊解析部25 1,字彙資訊解析部25 1係 上述的周知的語素解析系統、日本語修辭解析工具等 成,可將語音辨識結果予以解析。然後,將解析後的 ,輸出至錯誤區間前後文脈指定部250,錯誤區間前 脈指定部250係可基於該字彙資訊來抽出錯誤區間前 彙的字彙資訊,輸出至訂正部2 70。 上記生成字彙資訊的處理,係可在客戶端裝置1] 伺服器裝置120上進行,但設計成對伺服器裝置120 指示令其進行之,然後接收處理結果的方式,可降低 戶端裝置110上的處理量。 上述處理係在字彙W1及W2是未知詞時,特別 。所謂未知詞,係指未被包含在言語模型保持部282 ), 當成 資訊 正確 可爲 中時 析系 工具 。亦 係新 由如 所構 結果 後文 後字 〇或 發出 在客 有效 或字 200951940 典保持部283中的字彙。例如,訂正部270 (未知詞判定 手段)係判斷字彙W1及W2是否爲未知詞,若爲未知詞 時,則將從伺服器裝置120所送出的辨識結果中所含有的 字彙資訊視爲拘束條件,來進行訂正處理。 又,於客戶端裝置110上,亦可將該拘束條件予以登 錄。亦即’於圖25所示的客戶端裝置11〇的變形例中, 亦可將已被指定之錯誤區間的字彙及其前後(或至少一方 φ )之字彙、或與其字彙資訊成組者,視爲拘束條件,令其 記憶至拘束條件記憶部2 8 5 (拘束條件記憶手段)。藉此 ’訂正部2 7 0係當與錯誤區間指定部2 4 0中所被指定之錯 誤區間的字彙相同、或是其前後字彙爲相同時,就可依照 拘束條件記憶部2 8 5中所被記憶的拘束條件,來進行訂正 處理。藉此,就可迅速地進行該處理。亦即,從下次以後 ,即使偵測出未知詞,也只需立刻讀出已有登錄的拘束條 件,就能適用拘束條件。由於不需要重新作成拘束條件, φ 因此可以用較少的處理來設定拘束條件。 又’於訂正部270上,亦可依照已訂正之結果,將該 錯誤區間的字彙及其前後的字彙的連接機率,加以更新。 亦即,亦可設計成,連接機率,係被記憶在作爲連接機率 記憶手段而發揮機能的言語模型保持部282及字典保持部 283中,每次有適宜的訂正處理時就於訂正部270上所被 計算、作成的連接機率,係於言語模型保持部282及字典 保持部283中被更新。 又,訂正部270係判斷再辨識後之辨識結果、與該錯 -25- 200951940 誤區間被伺服器裝置120所辨識之辨識結果是否爲相同’ 此時,辨識結果係不輸出至統合部2 80 ’不在顯示部290 上顯示辨識結果,較爲理想。 又,在訂正部2 70中進行辨識所得到之辨識結果、和 該錯誤區間於伺服器裝置120上所被辨識之辨識結果之間 ,即使發生一辨識單位之誤差時也同樣地判斷爲辨識錯誤 ,就不將辨識結果輸出至統合部280’不在顯示部290上 顯示辨識結果,較爲理想。 例如,當圖4 ( a )中的語音區間與辨識結果的對應關 係有所不同時,更具體而言,係於語音區間中,伺服器裝 置120上的辨識結果爲,框架索引是0-9,而此時係爲“ 905 ( kyuumarugo ) ”的情況下,於訂正部270上的再辨 識時,變成了框架索引爲 0-15 、 “ 90555 ( kyuumarugogogo) ”的這種情況時,則該語音區間與辨識 結果的對應關係,在辨識結果與再辨識結果之間就發生誤 差。因此,可判斷爲辨識錯誤。此情況下,訂正部270係 使顯示部2 90上不顯示辨識結果,進行不輸出等之處理。 甚至’亦可設計成,訂正部2 70,係當已經判斷上述 辨識錯誤的情況下’若在從使用者受理文字資訊的受理部 (未圖示)上有文字輸入,則訂正部270係將所受理到的 文字(例如日文假名)當作拘束條件,來進行錯誤區間的 辨識結果之訂正處理。亦即,亦可對於錯誤區間的辨識結 果’有任何文字輸入時’則以該文字爲前提,來進行剩餘 部分的辨識處理。此情況下,若有辨識錯誤之判斷時,則 -26- 200951940 使受理部可以接受文字輸入。 此外,訂正部270,係藉由進行與伺服器裝置120上 所進行之辨識處理不同的語音辨識處理,就可防止再度進 行有誤的辨識。例如,改變音響模型、言語模型、字典來 進行辨識處理。 音響模型保持部281,係將音素與其頻譜,建立對應 而加以記憶的資料庫。言語模型保持部282,係將字彙、 φ 文字等之連鎖機率加以表示的統計性資訊,加以記憶用的 部分。字典保持部283,係將音素與文字的資料庫加以保 持,是記憶例如HMM(Hidden Marcov Model)用的部分 〇 統合部280,係將收訊部23 5上所接收到的語音辨識 結果當中,錯誤區間外的文字資料、和訂正部270上被再 辨識過之文字資料,加以統合用的部分。該統合部280, 係依照訂正部270上所被再辨識過的文字資料加以統合用 Q 之位置加以表示的錯誤區間(時間資訊),來進行統合。 顯示部290,係將統合部280上進行統合所得到之文 字資料,加以顯示用的部分。此外,顯示部290係被構成 爲,將伺服器裝置120上進行辨識後的結果,當作顯示內 容,較爲理想。又,當訂正部270上再辨識後的結果、和 錯誤區間在伺服器裝置120上所被辨識之結果相同時,使 該辨識結果不被顯示地進行顯示’較爲理想;又’此情況 下亦可顯示出無法辨識之意旨。再者’當訂正部270上再 辨識所得之辨識結果、和伺服器裝置120上辨識所得到之 -27- 200951940 辨識結果之間, 之可能性而不作 爲理想。 又,亦可不 間的長度,判斷 是1文字時,貝!I 他的方法來作訂 說明如此構 端裝置110之動 ,係藉由特徵量 s 1 0 1 )。然後, 料(S102 )。接 進行壓縮(S103 訊部225發送至 接著,於伺 裝置120發送辨 然後,根據語音 誤區間,基於該 S106 )。基於將 間特徵量抽出部 中抽出(S107 ) 訂正部270而進 料(S 1 0 8 )。然 上所接收到的文 具有時間資訊上的誤差時,也因爲有錯誤 顯示,或是令無法辨識之意旨被顯示,較 需要總是執行再辨識處理,可隨著錯誤區 是否進行再辨識處理。例如,當錯誤區間 不進行再辨識處理,而是以文字輸入等其 正。 成之客戶端裝置110的動作。圖6係客戶 〇 作的流程圖。透過麥克風所被輸入之語音 算出部210而將其特徵資料予以抽出( 在特徵量保存部230中係保存有特徵量資 著,藉由特徵量壓縮部22 0將特徵量資料 )。已被壓縮的壓縮特徵量資料,係被送 伺服器裝置120 ( S 104)。 服器裝置120上進行語音辨識,從伺服器 識結果,被收訊部23 5所接收(S105 ) 。 q 辨識結果,錯誤區間指定部240會指定錯 所被指定之錯誤區間,來指定前後文脈( 該前後文脈予以包含的錯誤區間,錯誤區 260會將特徵量資料從特徵量保存部230 。此處’基於所抽出的特徵量資料,藉由 行再度語音辨識,生成錯誤區間的文字資 後,錯誤區間的文字資料、和收訊部23 5 字資料會進行統合,經過正確辨識所得到 -28- 200951940 之文字資料,會被顯示在顯示部290上(S109)。 接著,再詳細說明上述S106〜S108中的處理。圖7 係表示該詳細處理的流程圖。適宜地參照圖5(a)來說明 〇 錯誤區間指定部240會基於辨識結果來指定錯誤區間 (S201 ( S106 ))。基於該錯誤區間,錯誤區間前後文脈 指定部250會指定錯誤區間的前面字彙W1 (圖5(a)) φ ,並保存之(S202 )。又,藉由錯誤區間前後文脈指定部 250,錯誤區間的後面字彙W2(圖5(a))會被指定而記 憶(S203 )。接著,藉由錯誤區間前後文脈指定部250, 指定該字彙W1的開始時間T1 (圖5 ( a) ) ( S204 ), 並指定字彙W2的結束時間T2 (圖5 ( a )),然後分別 保存之(S205 )。 如此’對錯誤區間再各自加上其前後一字彙(一辨識 單位)而得到的錯誤區間亦即開始時間T 1至結束時間T2 〇 的區間的特徵量資料,係被錯誤區間特徵量抽出部260所 抽出(S206(S107))。以字彙W1爲起點、字彙W2爲 終點的拘束條件之設定,會在訂正部270中進行(S207 ) 。然後,依照該拘束條件,訂正部270進行對特徵量資料 之辨識處理,執行訂正處理(S208)。 如以上所說明’說明本實施形態中的客戶端裝置n 〇 的作用效果。於該客戶端裝置110中,特徵量算出部210 會算出所被輸入之語音的特徵量資料’特徵量壓縮部22〇 係將特徵量資料發送至語音辨識裝置亦即伺服器裝置12〇 -29- 200951940 。另一方面,特徵量保存部23 0係將特徵量資料予以保存 〇 然後,於伺服器裝置120上進行辨識處理,收訊部 23 5係從伺服器裝置120接收辨識結果。錯誤區間指定部 240,係於所收到的辨識結果中,指定出有發生辨識錯誤 的錯誤區間。該錯誤區間指定部240,係可基於信賴度來 加以判斷。然後,錯誤區間特徵量抽出部260係將錯誤區 間的特徵量資料予以抽出,訂正部270係將所抽出之錯誤 區間的辨識結果,進行再辨識處理,以進行訂正處理。亦 即,於統合部280中,會將再辨識後的結果、和收訊部 23 5上所接收到的辨識結果,進行統合,以進行訂正處理 ,顯示部2 9 0就可顯示已被訂正過的辨識結果。藉此,在 辨識的結果當中,將有必要的部分進行訂正,可簡易地訂 正語音辨識之錯誤,可獲得正確的辨識結果。例如,可將 錯誤字彙最多削減70%。又,可將未知詞所造成的錯誤訂 正達60%以上。此外,信賴度係亦可從伺服器裝置120接 收,或可於客戶端裝置110上進行計算。 甚至,該客戶端裝置1 1 0係可使用錯誤區間前後文脈 指定部250,依照拘束條件來進行訂正處理(再辨識處理 )。亦即,將錯誤區間的前後字彙予以固定,依照該固定 的字彙來進行辨識處理,就可獲得精度較佳的辨識結果。 此外,本實施形態或其之後所示的其他實施形態中, 雖然第1次辨識處理是在伺服器裝置120上進行,但並非 限定於此,亦可第1次辨識處理是在客戶端裝置110中進 -30- 200951940 fr ’使第2次辨識處理在伺服器裝置i2〇上進行。此時, 想當然爾’錯誤區間的指定處理等是在伺服器裝置120上 進行。例如’此情況下’客戶端裝置110係具備,基於特 徵量算出部210上所算出的特徵量資料來進行辨識處理用 的辨識處理部’又,送訊部225係將此處的辨識結果與特 徵量資料,發送至伺服器裝置120。 在伺服器裝置120上,係具備相當於客戶端裝置ι1〇 〇 中的錯誤區間指定部240、錯誤區間前後文脈指定部250 、特徵量保存部230、錯誤區間特徵量抽出部260、訂正 部270之各部分’從客戶端裝置11〇所發送來的特徵量資 料’係被記憶在特徵量保存部中,基於辨識結果來進行錯 誤區間之指定、錯誤區間前後文脈之指定,基於這些而進 行之前所保存的特徵量資料的訂正處理(辨識處理)。如 此處理好的辨識結果’係被發送至客戶端裝置110。 又’於本實施形態或其之後所示的其他實施形態中, 〇 雖然使用已被錯誤區間前後文脈指定部250所定好的拘束 條件來進行再辨識(訂正處理):但在本例子的情況下, 是僅利用錯誤區間的特徵量資料。亦可不像這樣使用拘束 條件,就進行再辨識處理。 又,將伺服器裝置120上的辨識方法、和本實施形態 (或以下所示的其他實施形態)中的辨識方法,加以改變 ,較爲理想。亦即,於伺服器裝置1 20上,因爲必須要辨 識不特定多數使用者的語音,因此必須要具有通用性。例 如,伺服器裝置120中所採用的音響模型保持部、言語模 -31 - 200951940 型保持部、字典保持部的各模型數、字典數是設成大容量 ,音響模型中的音素之數目設爲較多,言語模型中的字彙 數目設定得較大等等,將各模型數、字典數都設成較大容 量,以使其能夠對應任何的使用者。 另一方面,客戶端裝置110上的訂正部270,就不需 要對應於任何使用者,可使用符合該客戶端裝置110之使 用者的語音的音響模型、言語模型、字典。因此,該客戶 端裝置110,係必須要將訂正處理、辨識處理、或郵件作 φ 成時的文字輸入處理作爲參考,適宜地更新各模型、字典 〇 又,客戶端裝置110,係更具備用以顯示已被訂正部 2 70所訂正過的辨識結果用的顯示部290,於伺服器裝置 120上所辨識的辨識結果,係不會被顯示在該顯示部290 。藉此,由於有辨識錯誤可能性的辨識結果不會顯示’因 此不會對使用者造成誤解。 又,客戶端裝置110’係當訂正部270上再辨識所得 ◎ 之辨識結果、和收訊部235所接收到的辨識結果爲相同時 ,或這些辨識結果分別所含有之時間資訊是有差異時’則 訂正部270係判斷爲辨識錯誤,顯示部290係不顯示辨識 結果。藉此,可防止顯示出錯誤的辨識結果。具體而言’ 可將錯誤字彙最多削減70%。又’可將未知詞所造成的錯 誤訂正達60%以上。 <第2實施形態> -32- 200951940 接著,說明不是基於信賴度來自動判斷錯誤區間,而 是藉由使用者手動判斷所構成之客戶端裝置11 〇a。圖8係 藉由使用者輸入而受理錯誤區間的客戶端裝置110a之機 能的區塊圖。如圖8所示,該客戶端裝置ll〇a,係含有: 特徵量算出部210、特徵量壓縮部220、特徵量保存部230 、送訊部225、收訊部23 5、操作部23 6、結果保存部237 、使用者輸入偵測部23 8、錯誤區間指定部240a、錯誤區 φ 間前後文脈指定部250、錯誤區間特徵量抽出部260、訂 正部270、統合部280、音響模型保持部281、言語模型保 持部282、字典保持部283、顯示部290所構成。該客戶 端裝置110a,係和客戶端裝置110同樣地藉由圖3所示的 硬體所實現。 該客戶端裝置ll〇a,係與客戶端裝置110,在具備操 作部236、結果保存部23 7、使用者輸入偵測部23 8、錯誤 區間指定部240a這點是不同的。以下就該相異點爲中心 Q 來說明。 操作部236,係受理使用者輸入用的部分。使用者係 可一面確認顯不部290上所顯不的辨識結果,一面指定錯 誤區間。操作部236,係可受理該指定。 結果保存部23 7,係將收訊部23 5所接收到的語音辨 識結果加以保存用的部分。保存的語音辨識結果,係以使 用者可目視的方式顯示在顯示部290。 使用者輸入偵測部23 8,係用來偵測操作部23 6所受 理到的使用者輸入用的部分,係將已被輸入的錯誤區間, -33- 200951940 輸出至錯誤區間指定部240a。 錯誤區間指定部240a,係依照從使用者輸入偵測部 238所輸入之錯誤區間來指定該區間用的部分。 接著,說明如此所被構成的客戶端裝置ll〇a之處理 。圖9係客戶端裝置110a之處理的流程圖。透過麥克風 所被輸入之語音’係藉由特徵量算出部210而將其特徵資 料予以抽出(S101)。然後’在特徵量保存部230中係保 存有特徵量資料(S102)。接著’藉由特徵量壓縮部22〇 #| 將特徵量資料進行壓縮(S103)。已被壓縮的壓縮特徵量 資料,係被送訊部225發送至伺服器裝置12〇 ( S104)。 接著’於伺服器裝置120上進行語音辨識,從伺服器 裝置120發送辨識結果’被收訊部235所接收,被暫時保 存,同時該辨識結果係被顯示在顯示部290 (S105a)。然 後,使用者係基於顯示部290上所顯示的辨識結果,來判 斷錯誤區間,將該錯誤區間予以輸入。然後,藉由使用者 輸入偵測部23 8而偵測該輸入,藉由錯誤區間指定部240 q 來指定錯誤區間。然後,基於該已被指定之錯誤區間,來 指定前後文脈(SI 0 6a)。基於將該前後文脈予以包含的 錯誤區間,錯誤區間特徵量抽出部2 60會將特徵量資料予 以抽出(S1 07),藉由訂正部2 70而進行再度語音辨識, 生成錯誤區間的文字資料(S108)。然後,錯誤區間的文 字資料、和收訊部23 5上所接收到的文字資料會進行統合 ,正確的文字資料,會被顯示在顯示部290上(S109)。 接著,再詳細說明上述 S105a〜S108中的處理。圖 -34- 200951940 10係客戶端裝置11 〇a上的藉由使用者輸入而指定錯誤區 間時的詳細處理的流程圖。 收訊部235會接收辨識結果,並顯示在顯示部290 ( S301)。使用者係一面確認顯示部290上所顯示的辨識結 果,一面指定錯誤區間,藉由使用者輸入偵測部2 3 8偵測 該錯誤區間的起點位置,並予以暫時保存(S302 )。然後 ,錯誤區間前後文脈指定部250會指定錯誤區間的前面字 Q 彙W1、並保存之(S303),已被保存的字彙W1的開始 時間T1會被指定、保存(S 3 04 )。 又,使用者所指定錯誤區間的終點位置會被使用者輸 入偵測部23 8所測出,並予以暫時保存(S3 05 )。然後, 錯誤區間前後文脈指定部250會指定錯誤區間的後面字彙 W2、並保存之(S306),已被保存的字彙W2的結束時間 T2會被指定、保存(S3 07 )。 這些處理之後,從開始時間T1至結束時間T2的特徵 〇 量資料,係被錯誤區間特徵量抽出部260所抽出(S308 ) 。以字彙w 1爲起點、字彙W2爲終點的拘束條件之設定 ,會在訂正部270中進行(S309)。然後,依照該拘束條 件,訂正部2 70進行對特徵量資料之辨識處理,執行訂正 處理(S310 )。 藉由如此處理,就可藉由使用者輸入來指定錯誤區間 ,藉此可進行再辨識而進行辨識結果的訂正處理。 於此種客戶端裝置ll〇a中,顯示部290會顯示辨識 結果,使用者係目視確認之,並且,使用者藉由操作操作 -35- 200951940 部236,就可指定錯誤區間,亦即欲訂正的地點。藉此, 在辨識的結果當中,將有必要的部分進行訂正,可簡易地 進行訂正處理,同時,可獲得正確的辨識結果。 <第3實施形態> 接著說明’當從伺服器裝置120所發送來的辨識結果 中不含有時間資訊時,可正確指定錯誤區間的客戶端裝置 110b。圖11係該客戶端裝置110b之機能的區塊圖。該客 戶端裝置ll〇b,係含有:特徵量算出部210、特徵量壓縮 部220、送訊部225'特徵量保存部230、收訊部235、時 間資訊算出部239、錯誤區間指定部240、錯誤區間特徵 量抽出部260、錯誤區間前後文脈指定部250、訂正部270 、音響模型保持部281、言語模型保持部282、字典保持 部283所構成。該客戶端裝置ll〇b,係和第1實施形態的 客戶端裝置110同樣地藉由圖3所示的硬體所實現。 又,與第1實施形態的客戶端裝置110之相異點係爲 ’此客戶端裝置ll〇b係從伺服器裝置120接收不含有經 過資訊的辨識結果,然後,於時間資訊算出部239中基於 辨識結果亦即文字資料來自動算出經過時間(框架索引) 這點。以下就該相異點爲中心來說明客戶端裝置110b。 時間資訊算出部239,係使用收訊部235上所接收到 的辨識結果當中的文字資料及特徵量保存部23 0中所記憶 的特徵量資料,算出文字資料的經過時間用的部分。更具 體而言,係時間資訊算出部23 9,係藉由比較所被輸入的 200951940 文字資料、和特徵量保存部230中所記憶的特徵量資料, 將文字資料的一字彙或一辨識單位轉換成頻率資料時,判 斷與特徵量資料一致到哪個部分,藉此就可算出文字資料 的經過時間。例如,當特徵量資料的10框架部分爲止是 和文字資料的一字彙一致時,則該一字彙就具有10框架 的經過時間。 錯誤區間指定部240b,係可使用被時間資訊算出部 0 239所算出的經過時間及文字資料,來指定錯誤區間。該 錯誤區間指定部24〇b,係基於辨識結果中所含有之信賴度 資訊來判斷錯誤區間。此外,亦可如第2實施形態那樣, 藉由使用者輸入來指定錯誤區間。 如此,基於已被錯誤區間指定部240b所指定的錯誤 區間’錯誤區間前後文脈指定部2 5 0係指定含有前後上下 文的錯誤區間,錯誤區間特徵量抽出部260係將該錯誤區 間的語音資料予以抽出,然後訂正部270係進行再度辨識 @ 處理,就可進行訂正處理。 接著,說明該客戶端裝置ll〇b的處理。圖12係客戶 端裝置ll〇b之處理的流程圖。透過麥克風所被輸入之語 音’係藉由特徵量算出部210而將其特徵資料予以抽出( S1 01)。然後,在特徵量保存部230中係保存有特徵量資 料(S102)。接著,藉由特徵量壓縮部22 0將特徵量資料 進行壓縮(S103)。已被壓縮的壓縮特徵量資料,係被送 訊部225發送至伺服器裝置120(S104)。 接著,於伺服器裝置120上進行語音辨識,從伺服器 -37- 200951940 裝置120發送辨識結果(不含經過時間),被收訊部235 所接收(S105)。然後,根據語音辨識結果及特徵量保存 部230的特徵量資料,藉由時間資訊算出部239而算出經 過時間,使用該經過時間及語音辨識結果,藉由錯誤區間 指定部240而指定錯誤區間。藉由錯誤區間前後文脈指定 部250,基於該已被指定之錯誤區間,來指定前後文脈( SI 06b )。基於將該前後文脈予以包含的錯誤區間,錯誤 區間特徵量抽出部260會將特徵量資料予以抽出(S107) ,藉由訂正部2 70而進行再度語音辨識,生成錯誤區間的 文字資料(S1 08)。然後,錯誤區間的文字資料、和收訊 部23 5上所接收到的文字資料會進行統合,正確的文字資 料,會被顯示在顯示部2S>0上(S109)。 接著,說明包含S106b的更詳細之處理。圖13係 S 105至S1 08的詳細處理的流程圖。 藉由收訊部23 5接收不含經過時間的辨識結果(S4〇l ),於時間資訊算出部239上算出文字資料中的經過時間 (S402 )。藉由錯誤區間指定部240,從辨識結果中指定 出錯誤區間(S 4 0 3 )。基於該錯誤區間,錯誤區間前後文 脈指定部250會指定錯誤區間的前面字彙W1 (圖5 ( 3) ),並保存之(S404 )。又,藉由錯誤區間前後文脈指定 部25 0’錯誤區間的後面字彙W2(圖5(a))會被指定 而記憶(S405)。接著,藉由錯誤區間前後文脈指定部 250,指定該字彙W1的開始時間T1 (圖5(a) ) (S406 ),並指定字彙W2的結束時間T2 (圖5 ( a) ) ( S407 -38- 200951940 如此’對錯誤區間再各自加上其前後一字彙而得到的 錯誤區間亦即開始時間τ 1至結束時間T2的區間的特徵量 資料’係被錯誤區間特徵量抽出部260所抽出(S408 )。 以字彙W1爲起點、字彙W2爲終點的拘束條件之設定, 會在訂正部270中進行(S4 09 )。然後,依照該拘束條件 ’訂正部2 7 0進行對特徵量資料之辨識處理,執行訂正處 ❿理(S410 )。 若依據此客戶端裝置ll〇b,則基於被收訊部23 5所接 收到的辨識結果與特徵量保存部2 3 0中所記憶的特徵量資 料,時間資訊算出部239會算出辨識結果的經過時間。然 後,錯誤區間指定部240,就可基於該時間資訊,來指定 錯誤區間。此處’基於已指定的錯誤區間來指定其前後文 脈,然後,基於其特徵量資料來進行訂正處理。藉此,當 辨識結果中沒有包含時間資訊時,也可指定適切的錯誤區 ❹ 間。 <第4實施形態> 接著,說明僅根據於伺服器裝置120上進行語音辨識 所得到的辨識結果,來進行訂正處理的客戶端裝置ll〇c。 圖14係客戶端裝置110c之機能的區塊圖。該客戶端裝置 ll〇c,係含有:特徵量算出部210、特徵量壓縮部220、 錯誤區間指定部240、錯誤區間前後文脈指定部25〇、訂 正部270a、及言語DB保持部284所構成。該客戶端裝置 -39- 200951940 1 1 0c,係和客戶端裝置1 1 〇同樣地藉由圖3所示的硬體所 實現。 該客戶端裝置ll〇c,係相較於客戶端裝置110,在不 將語音輸入所得之特徵量資料予以記憶,且在該特徵量資 料訂正處理之際不再度使用之構成這點,有所不同,具體 而言,係不具備特徵量保存部23 0、錯誤區間特徵量抽出 部260、音響模型保持部281、言語模型保持部282、字典 保持部283這點,有所不同。以下,基於相異點加以說明 〇 特徵量算出部210,係根據語音輸入而算出特徵量資 料’特徵量壓縮部220,係將特徵量資料予以壓縮,發送 至伺服器裝置120。然後,收訊部23 5,係從伺服器裝置 120接收辨識結果。錯誤區間指定部240,係藉由信賴度 資訊或使用者操作來指定錯誤區間,錯誤區間前後文脈指 定部250係指定其前後文脈,然後指定錯誤區間。 訂正部270a,係將已被含前後文脈之錯誤區間所指定 的文字資料,基於言語DB保持部2 84中所記憶的資料庫 ’來進行轉換處理。該言語DB保持部284 ,係記憶著與 言語模型保持部2 82大致相同的資訊,是記憶著每一音節 的連鎖機率。 然後’該訂正部270a,係將有發生錯誤區間之可能性 的字彙列w ( Wi,Wi + l…Wj ),加以列出。此處,也會將 字彙列w的數目限制爲K。關於限制的數目κ,係設成和 錯誤字彙數Ρ相同,或是接近Ρ的一定範圍(K = P-c至 -40- 200951940 P + c )。 然後,訂正部270a係計算出,將所被列出的所有字 彙列限定成前後字彙W1與W2時的似然(Likelihood )。 亦即,對所有的W序列,利用終端內所保存的言語D B, 使用以下的式(1)來求出似然。 字彙列(W1 w W2)的似然 P ( wl w w2) =P ( Wl, Wi,Wi+l …Wj ,W2 ) =P ( W1 ) *P ( Wi/Wl ) *?( W2/Wj ) ---(1) 0 然後計算錯誤區間之字彙列與候補的距離,也時也會 加上該距離。此情況下就變成以下的式(2)之計算式。 字彙列(W1 w W2 )的似然 P ( wl w w2 ) =P ( Wl,Wi,Wi+l ··· Wj ,W2 ) *P ( Wi,Wi+l ·· Wj ,Werror ) · . · (2) P ( Wi,Wi+l ··· Wj,Werror )係表示錯誤字彙列 Werror 與 候補列Wi,Wi+l…Wj 間的距離。 該式的P ( Wn/Wm )係在N-gram模型當中將Bi-gram 視爲對象者,是表示Wm之後出現Wn之機率。此處雖然 〇 是以Bi-gram的例子來說明,但亦可利用其他的N-gram 模型。 統合部280,係將如此已被訂正部270a所轉換的文字 資料,與所接收到的辨識結果中的文字資料加以統合,顯 示部290係將統合並訂正過的文字資料,予以顯示。此外 ,亦可在統合之前,將使用訂正部270a所算出的似然來 排序過的候補予以列出,讓使用者來選擇之,也可自動決 定似然最高的候補。 接著,說明如此所被構成的客戶端裝置ll〇c之處理 -41 - 200951940 。圖15係客戶端裝置uoc之處理的流程圖。基於語音輸 入的語音資料,特徵量算出部210會算出特徵量資料,被 特徵量壓縮部2 2 0壓縮過的特徵量資料,係被發送至伺服 器裝置 1 20 ( S502 )。 接著,於伺服器裝置120上進行語音辨識後的辨識結 果,係被收訊部235所接收(S502),藉由錯誤區間指定 部240而指定出錯誤區間(S503)。此處,錯誤區間之指 定,係可基於信賴度來爲之,也可藉由使用者輸入來指定 〇 其後,錯誤區間前後文脈指定部250會指定錯誤區間 的前後文脈(字彙)(S504 )。然後,藉由訂正部270a, 進行再度轉換處理,此時錯誤區間的候補會被列出(S505 )。此處,藉由訂正部270a而計算出各候補的似然( S 506 ),基於似然來進行排序處理(S507 ),排序處理過 的候補群會被顯示在顯示部290 (S508)。 於該客戶端裝置ll〇c中,特徵量算出部210會根據 所輸入之語音而算出特徵量資料,特徵量壓縮部220會將 其予以壓縮,送訊部225會將其發送至伺服器裝置120。 在伺服器裝置120上,會進行語音辨識,收訊部23 5會接 收其辨識結果。然後,錯誤區間指定部240,係基於錯誤 區間前後文脈指定部2 5 0上所指定的錯誤區間,而由訂正 部270a來進行訂正處理。然後,統合部280所作的統合 處理之後,顯示部290就會顯示訂正後的辨識結果。藉此 ,在辨識的結果當中,將有必要的部分進行訂正,可簡易 -42- 200951940 地訂正語音辨識之錯誤,可獲得正確的辨識結果。此外, 在本實施形態中’相較於第1實施形態,因爲可不記憶特 徵量資料,且在再辨識處理中不使用該特徵量資料這點, 所以其構成可變得較爲簡易。 <第5實施形態> 接著’說明不是使伺服器裝置120進行語音辨識的分 ❹ 散型處理’而是於客戶端裝置110d上,進行第一語音辨 識及第二語音辨識之形態。 圖16係客戶端裝置n〇d之機能構成的區塊圖。客戶 端裝置110d,係含有:特徵量算出部210、第一辨識部 226 (取得手段)、言語模型保持部227、字典保持部22 8 、音響模型保持部229、特徵量保存部230、錯誤區間指 定部240、錯誤區間前後文脈指定部25〇、錯誤區間特徵 量抽出部260、訂正部270、音響模型保持部281、言語模 〇 型保持部282、字典保持部283、統合部280、顯示部290 所構成。該客戶端裝置110d,係和客戶端裝置11〇同樣地 藉由圖3所示的硬體所實現。 該客戶端裝置liod,係與第1實施形態的客戶端裝置 11〇 ’在沒有用來與伺服器裝置120通訊之構成這點,以 及具備第一辨識部226、言語模型保持部227、字典保持 部228、音響模型保持部229這點,有所不同。以下就相 異點爲中心來說明。 第~辨識部226,係對特徵量算出部210上所算出之 -43- 200951940 特徵量資料,使用言語模型保持部227、字典保 、及音響模型保持部229來進行語音辨識。 言語模型保持部227,係將字彙、文字等之 加以表示的統計性資訊,加以記憶用的部分。字 228,係將音素與文字的資料庫加以保持,是 HMM (Hidden Marcov Model)用的部分。音響 部229,係將音素與其頻譜,建立對應而加以記 庫。 錯誤區間指定部240,係將上述的錯誤區 2 40中所辨識出來的辨識結果予以輸入,並指定 。錯誤區間特徵量抽出部260,係指定錯誤區間 脈,錯誤區間特徵量抽出部2 6 0係將含前後文脈 間的特徵量資料,予以抽出。然後,訂正部270 徵量資料來進行再度辨識處理。此訂正部270, 二辨識部而發揮機能。 然後,一旦統合部280所作的統合處理進行 示部290就可顯示已被訂正過的辨識結果。 接著,說明該客戶端裝置11 0d的動作。圖 端裝置11 0d之處理的流程圖。藉由特徵量算出ΐ 算出所被輸入之語音的特徵量資料(S601 ),所 特徵量資料,係被保存在特徵量保存部230中( 與該保存處理平行地,藉由第一辨識部226來陣 識(S603 )。 被第一辨識部226所語音辨識過的辨識結果 持部228 連鎖機率 典保持部 記憶例如 模型保持 憶的資料 間指定部 錯誤區間 的前後文 之錯誤區 係基於特 係成爲第 後,則顯 1 7係客戶 部210而 被算出的 S602 )。 型語音辨 的錯誤區 200951940 間,係被錯誤區間指定部2 4 0及錯誤區間前後文脈指定部 250所指定(S604 )。該已被指定之錯誤區間(含前後文 脈)的特徵量資料,係從特徵量保存部230被錯誤區間特 徵量抽出部260所抽出(S605)。然後,藉由訂正部270 而再度辨識錯誤區間的語音(S 606 )。此處,已被辨識的 辨識結果,係被統合部280所統合,藉由顯示部290而顯 示出辨識結果(S607 )。 Q 如此,在客戶端裝置110d內,會藉由第一辨識部226 及第二辨識部(訂正部)270來進行辨識處理,因此可進 行較正確的語音辨識。此外,第一辨識部226與第二辨識 部係採用不同的辨識方法,較爲理想。藉此,對於第一辨 識部226中所未能辨識的語音,仍可於第二辨識部270中 進行補救,整體而言可期待正確的語音辨識之結果。 若依據客戶端裝置11 0d,則於特徵量算出部210上根 據所輸入之語音來算出特徵量資料,於特徵量保存部230 φ 中將其記憶。另一方面,第一辨識部226,係基於特徵量 資料來進行語音辨識處理,錯誤區間指定部240及錯誤區 間前後文脈指定部250係於已被辨識的辨識結果中,指定 有發生辨識錯誤的錯誤區間。然後,訂正部270 (第二辨 識部)’係將已被指定之錯誤區間的辨識結果,予以訂正 。藉此,在辨識的結果當中,將有必要的部分進行訂正, 可簡易地進行訂正處理’同時,可獲得正確的辨識結果。 又,藉由在客戶端裝置110d內進行二度辨識處理,就不 需要使用伺服器裝置120。 -45 - 200951940 <第6實施形態> 接著,說明第2實施形態的變形例亦即第6實施形態 。若依據該實施形態,則可自動判斷錯誤區間之終點’具 有如此特徵。 圖18係第6實施形態的客戶端裝置11 Of之機能構成 的區塊圖。客戶端裝置11 Of,係含有:特徵量算出部210 、特徵量壓縮部220、特徵量保存部23 0、送訊部225、收 訊部23 5、操作部23 6、結果保存部23 7、使用者輸入偵測 部238、錯誤區間指定部240c、終點判斷部241、錯誤區 間前後文脈指定部250、錯誤區間特徵量抽出部260、訂 正部2 70、統合部280、音響模型保持部281、言語模型保 持部282、字典保持部283、顯示部290所構成。該客戶 端裝置ll〇f,係和客戶端裝置110同樣地藉由圖3所示的 硬體所實現。 該客戶端裝置1 l〇f,係於錯誤區間指定部240c中僅 受理錯誤區間之起點,終點判斷部24 1係基於所定之條件 來判斷錯誤區間之終點這點,是與第2實施形態不同。以 下’基於圖18所示的區塊圖,以與第2實施形態之相異 點爲中心來進行說明。 和第2實施形態所示之構成同樣地,客戶端裝置il〇f ’係將伺服器裝置120上所辨識出之辨識結果,以收訊部 235進行接收’結果保存部237會保存該辨識結果。然後 ’該辨識結果會被顯示在顯示部290,同時使用者係一面 200951940 觀看該顯示部290上所顯示的辨識結果,一面操作著操作 部236 ’藉此以指定錯誤區間之起點。使用者輸入偵測部 23 8,係偵測該起點,將其輸出至錯誤區間指定部24〇c。 錯誤區間指定部240c,係依照由使用者所指定的起點 及終點判斷部24 1中所判斷的終點,來指定錯誤區間。在 判斷錯誤區間之終點之際’錯誤區間指定部2 4 0 c係一旦 偵測有從使用者指定了起點,則將該意旨輸出至終點判斷 〇 部241,指示終點之判斷。 終點判斷部24 1,係依照來自錯誤區間指定部24〇c之 指示’自動判斷錯誤區間之終點用的部分。例如,終點判 斷部241 ’係將收訊部23 5上所接收、被保存於結果保存 部237中的語音辨識結果中所含之信賴度資訊,和預先設 定的閾値進行比較,將信賴度超過閾値的字彙(或信賴度 最高的字彙),判斷爲錯誤的終點。然後,終點判斷部 24 1,係將已判斷之終點,輸出至錯誤區間指定部240c, 〇 錯誤區間指定部240c就可指定錯誤區間。 例如用以下的語音爲例來說明。此外,這裡爲了說明 上的方便,假設是指定“活性化”這個詞來作爲錯誤區間 之起點。 <發聲內容> 「乙乃目標奁達成汔吣匕ii、皆乃協力尔必要τ 言。」 (kono mokuhyo\i wo tassei suru tame ni wa,mina san no kyouryouku ga hituyou desu ° ) -47- 200951940 中譯:「爲了達成此目標,需要各位的協力。」 <語音辨識結果> 乙©目標奁活性化ο /;:沁丨;:过、皆$九0協力尔必要τ t。」 ’a, mina san no (kono mokuhyou wo kasseika no tame ni kyouryouku ga hituyou desu 〇 ) 中譯:「爲了活性化此目標,需要各位的協力。」 此處,將語音辨識結果,切割成字彙單位來看。此外,“ / ”係表示字彙的區隔。 「:刃/目標/仓/活性化/ 0 /广:吣/匕/ (±、/皆/ $人/乃/協力/ 力W必要/ T f。」 (kono/mokuhy〇u/wo/kasseika/no/tame/ni/wa,/mina/san/no/ky〇uryouku/ ga/hituyou/desu ° ) 中譯:「爲了 /活性化/此/目標/,/需要/各位/的/協力/。」 作爲該語音辨識結果,“活性化(kasseika) ”的信賴度 爲0.1、 ❼(no) ”的信頼度爲〇.〇1、 (tame) ”的信頼度爲0.4、 “( ni) ”的信頼度爲0.6的情況下 ’若閾値設爲0.5,則可判斷“活性化/⑦/汔沁/ (乙( kasseika/no/tame/ni) ” 中的 “{;: (ni) ” 是終點。 此外’終點判斷部24 1,雖然也可將信賴度爲閾値以 上的字彙的前一個(上面的例子中係爲“七吣(tame ) ” )判斷爲終點,但在錯誤區間的指定上,只要結果而言有 包含到錯誤的部分即可,因此可採取任一方法。 此種錯誤區間的指定方法,由於是按照使用者平常的 -48- 200951940 訂正習慣而爲之,因此很便於使用。亦即’例如在漢字變 換時,使用者指定錯的情況下’首先輸入了起點,接著刪 除錯誤,然後輸入正確字彙列’是—般常用的慣例。上述 的錯誤區間之指定方法也是,輸入了起點後,就自動地定 出終點,因此符合該操作方法,對使用者而言可沒有異樣 感地進行操作。 又,終點判斷部2 41,係在判斷終點之際,不限定於 u 上述方法。例如,亦可爲依照特定的發音記號來判斷終點 的方法,或是將錯誤起點開始後第η個字彙視爲終點之方 法。此處,所謂依照發音記號之方法,係係爲基於發話中 的停頓來進行判斷之方法,亦可爲基於出現在語句交界的 短停頓(逗點)、出現在發話最後的長停頓(句點)來進 行判斷。藉此,以文章的區隔來進行判斷,就可期待較正 確的語音辨識。 以下說明其具體例。作爲語音是以和上述相同內容的 〇 以下內容爲例來說明。 <發聲內容> 「二0目標奁達成t* δ /C吣t乙丨i、皆$九〇協力妒必要Τ 玄。」 (kono mokuhyou wo tassei suru tame ni wa,mina san no kyouryouku ga hituyou desu ° ) 中譯:「爲了達成此目標,需要各位的協力。j <語音辨識結果>Sessionlnitiation Protocol), etc., for processing. Further, the server device 120 uses these protocols to perform a reception process or a loopback process. Then, on the server device 120, the compressed feature amount data can be decompressed, and the feature amount data can be used for speech recognition processing. The feature amount compressing unit 220 is used for data compression in order to reduce the communication flow. Therefore, the transmitting unit 22 5 may perform compression without directly compressing the feature amount data. The feature amount storage unit 230 is a portion for temporarily storing the feature amount data calculated by the feature amount calculation unit 210. The receiving unit 23 5 is a portion for receiving the voice recognition result returned from the server device 120. The speech recognition result includes text data, time information, and reliability information. The time information indicates the elapsed time of each identification unit of the text data, and the reliability information is information indicating the accuracy of the identification result. For example, the information shown in Fig. 4(a) is received as the identification result. In Fig. 4(a), although the utterance content, the identification content, the voice section, and the reliability are described as being associated, the utterance content is not actually included. Here, the number shown in the voice section is an index indicating the frame, and is an index indicating the initial frame of the identification unit. Here, the 1 frame is equivalent to about 10 msec. Further, the reliability indicates the reliability of each recognition unit of the speech recognition result recognized on the server device 12A, and is a number indicating how accurate the degree is. This is the number of data generated by the probability of use of the recognition result, and is added to the number of units of the vocabulary to be recognized on the server device 120. For example, the method of generating the reliability is described in the following references. References: Li Huangshen, Kawahara Tatsuya, and Lu Ye Qinghong, "The reliability calculation method based on the high-speed vocabulary after 2-passs exploration algorithm", Information Processing Society Research Report, 2003-SLP-49-48, 2003-1 2. In Fig. 4(a), for example, the "sell" (ure η τ ) (urete) of the identification result is composed of a frame of 33 frames to 57, and the letter 200951940 has a degree of 0.86. The error section specifying unit 240 specifies the portion for the error section based on the speech recognition result received by the receiving unit 235. The error section specifying unit 240 can specify an error section based on, for example, the reliability information included in the speech recognition result transmitted from the server device 120. For example, in Fig. 4(a), as a result of the identification, the text data is 905 (kyuumarugo), and the time information is 9 frames (0 90msec). The reliability is 0.59, and in another location. The reliability of the "if (of)" (doko) of the identification result is 〇.〇4. Then, the error section specifying unit 240 can determine that the reliability is below the predetermined threshold ’ as an error, and can designate the section as an error section. For example, when the reliability is set to 0.2 or less, the part of B (doko), "T" (de), and "tofu" is judged to be incorrect. This is an error interval. The threshold is a number that can be set in advance on the side of the client device 110. Alternatively, it can be used in accordance with the personal difference of Q speech, the amount of noise (noise), or the calculation method of reliability. Change setting. That is, when there are many noises, the reliability will be further reduced', so the threshold is set lower; in addition, when the reliability attached to the speech recognition result is low overall, or vice versa When it is very high, it can be set according to the level of its reliability. For example, the threshold can be set based on the central axis of reliability, or the threshold can be set based on the average value. Figure 4(b) shows The Chinese example of the pronunciation is used as a reference. The 'client device 11' is provided with a reliability calculation unit (not shown) for calculating the reliability information of the identification result, and the error interval specifying unit-21 - 200951940 240 Based on the client The error interval is set in the error information calculated in the error interval. The error interval before and after the context specifying unit 250 specifies the vocabulary recognized before and after the error interval based on the error interval specified by the error interval specifying unit 240 (at least The part used for identifying the unit. The following is an example of using only one word before and after. In Figure 5(a), one of the identification units identified before the error interval is shown. The commemorative map at the time of designation, as shown in Fig. 5 (a), before the error interval of the identification result, specifies the speech interval of the vocabulary before the error interval and the speech interval of the vocabulary after the error interval. The extraction unit 260 extracts the feature amount data of the error section (which may include at least one identification unit before and after) specified by the error section before and after the context specifying unit 250, from the feature amount storage unit 230. The correction unit 270 performs a portion for re-speech recognition by the feature amount data extracted by the error section feature amount extracting unit 260. The part 2 70 performs speech recognition using the acoustic model holding unit 281, the speech model holding unit 282, and the dictionary holding unit 2 83. Then, the correction unit 270 sets the erroneous section before and after the context specifying unit 25 0 The vocabulary (before and after the context) indicated in the previous speech interval is specified, and the speech recognition is performed as the constraint condition. Fig. 5(b) is based on the vocabulary designated by the error interval before and after the context specifying unit 250. As shown in Fig. 5(b), when the vocabulary W1 of the preceding section of the error section and the vocabulary W2 of the following section are regarded as the constraint conditions, the identification candidate will become limited to 200951940. In the example of FIG. 5(b), the identification candidates can be filtered into A to Z, and suitable candidates can be selected from the filtered candidates, and the identification processing can be performed efficiently. Further, the 'correction unit 270' may perform correction processing based on a rhetorical relationship with a preceding and succeeding vocabulary, a usage form (aft change), and the like. For example, the correction unit 270 may extract the identification candidates a to Z of the vocabulary of the error interval by a complex word 'based on the rhetoric relationship between the preceding and following vocabulary W1 and W2, and calculate the score of each of the corrected candidates by Q. The revised candidate is considered as the identification result. Further, the 'correction unit 270 can use the vocabulary information for specifying the vocabulary to be used even if the vocabulary W1 of the current surface section or the vocabulary W2 of the subsequent section is not included in the speech model holding unit 282 or the dictionary holding unit 283. The vocabulary information used for the specific before and after vocabulary is regarded as a constraint condition for the correction processing (re-speech recognition processing). For example, the client device 110 receives the part-of-speech information for each part of the word vocabulary W1 and the vocabulary W2 as the vocabulary information, and receives the part-of-speech information from the server device 120. The correcting unit 270 sets the vocabulary of the vocabulary W1 and the vocabulary W2. Information is corrected as a constraint. Thereby, a more correct correction process, that is, a voice recognition process, can be performed. Specifically, among the vocabulary information added to the speech recognition result received by the receiving unit 235, the error section specifying unit 240 extracts the vocabulary information before and after (or either) of the error section, and outputs the vocabulary information. To the correction unit 270. In the correction unit 270, the vocabulary information is regarded as a constraint condition, and the designated portion is subjected to the correction processing. The commemorative diagram is shown in Figure 24. As shown in Fig. 24-23-200951940, the vocabulary W1 corresponds to the part-of-speech information A (for example, the auxiliary word corresponds to the vocabulary W2, and the vocabulary information B (for example, a verb) is set by the constraint condition. The correction unit 270 borrows The vocabulary information is not limited to the part of speech information, and the vocabulary information is used for specific vocabulary other than vocabulary such as vocabulary. In addition, when the necessary vocabulary information is not included in the speech recognition result, the well-known morphemes are used to solve the articles belonging to the identification object (for example, "tea", "Mecab"), Japanese In the modified example of the client device 110 shown in FIG. 25, a vocabulary information analysis unit 25 is added to the vocabulary information analysis unit 25, and the vocabulary information analysis unit 25 is added to the morphological information. 1 is a well-known morphological analysis system and a Japanese rhetorical analysis tool, and the speech recognition result can be analyzed. Then, the analyzed result is output to the error area. The error interval specifying unit 250 extracts the vocabulary information of the error section before the error section based on the vocabulary information, and outputs the vocabulary information to the correcting unit 2 70. The processing for generating the vocabulary information is performed on the client device. 1] is performed on the server device 120, but is designed to instruct the server device 120 to perform it, and then receive the processing result, thereby reducing the amount of processing on the client device 110. The above processing is performed in the vocabulary W1 and W2. When the unknown word is special, the so-called unknown word means that it is not included in the speech model holding unit 282), and when the information is correct, it can be a medium-time analysis tool. It is also a new vocabulary from the result of the construction of the result or the validity of the guest or the word 200951940. For example, the correction unit 270 (unknown word determination means) determines whether or not the vocabulary W1 and W2 are unknown words, and if it is an unknown word, the vocabulary information contained in the identification result sent from the server device 120 is regarded as a constraint condition. , to carry out the correction process. Further, on the client device 110, the restriction condition can be registered. That is, in the modification of the client device 11A shown in FIG. 25, the vocabulary of the specified error section and the vocabulary of the front and rear (or at least one φ) or the vocabulary information thereof may be grouped. It is regarded as a restraint condition, and it is memorized in the restraint condition memory unit 2 8 5 (constrained condition memory means). In this way, the 'correction unit 270' is the same as the vocabulary of the error section specified in the error section specifying unit 240, or the vocabulary of the preceding and following vocabulary is the same, and the memory unit 285 can be used according to the constraint condition. Correction is carried out by the restraint condition of memory. Thereby, the processing can be performed quickly. That is, from the next time, even if an unknown word is detected, it is only necessary to immediately read out the existing binding conditions, and the constraint condition can be applied. Since it is not necessary to re-establish the constraint condition, φ can therefore set the constraint condition with less processing. Further, the correction unit 270 may update the vocabulary of the error section and the connection probability of the vocabulary before and after the correction in accordance with the corrected result. In other words, it is also possible to design a connection probability, which is stored in the speech model holding unit 282 and the dictionary holding unit 283 which function as a connection probability memory means, and is applied to the correction unit 270 every time there is an appropriate correction process. The connection probability calculated and created is updated in the speech model holding unit 282 and the dictionary holding unit 283. Further, the correction unit 270 determines whether the recognition result after the re-identification and the identification result recognized by the server device 120 with the error -25-200951940 error interval are the same'. At this time, the identification result is not output to the integration unit 2 80. It is preferable that the identification result is not displayed on the display unit 290. Further, between the identification result obtained by the identification in the correction unit 2 70 and the identification result recognized by the error section on the server device 120, even if an error of the identification unit occurs, the recognition error is similarly determined. It is preferable that the identification result is not output to the integration unit 280' and the recognition result is not displayed on the display unit 290. For example, when the correspondence between the speech interval and the recognition result in FIG. 4(a) is different, more specifically, in the speech interval, the recognition result on the server device 120 is that the frame index is 0-9. In the case of "905 (kyuumarugo)", when the re-recognition on the correction unit 270 becomes the case where the frame index is 0-15 and "90555 (kyuumarugogogo)", the voice is The correspondence between the interval and the identification result causes an error between the identification result and the re-identification result. Therefore, it can be judged that the error is recognized. In this case, the correction unit 270 does not display the recognition result on the display unit 290, and performs processing such as no output. It is even possible to design the correction unit 2 70. When the identification error has been determined, 'If there is a character input on the receiving unit (not shown) that accepts the text information from the user, the correction unit 270 will The accepted text (for example, Japanese pseudonym) is used as a constraint condition to correct the result of the identification of the error interval. In other words, it is also possible to perform the identification processing of the remaining portion on the premise that the character is input when there is any character input in the recognition result of the error section. In this case, if there is a judgment error, -26- 200951940 allows the receiving unit to accept text input. Further, the correction unit 270 can prevent the erroneous recognition from being performed again by performing the speech recognition processing different from the identification processing performed on the server device 120. For example, the acoustic model, the speech model, and the dictionary are changed for identification processing. The acoustic model holding unit 281 is a database in which the phoneme and its spectrum are associated and stored. The speech model holding unit 282 is a part for storing statistical information indicating the chain probability of vocabulary and φ characters. The dictionary holding unit 283 holds the phoneme and the character database, and stores a partial merging unit 280 for HMM (Hidden Marcov Model), for example, among the voice recognition results received by the receiving unit 253. The text data outside the error interval and the text data recognized on the correction unit 270 are integrated. The integration unit 280 integrates the error sections (time information) indicated by the position of the Q in accordance with the re-recognized text data on the correction unit 270. The display unit 290 is a portion for displaying the text data obtained by integrating the integration unit 280. Further, the display unit 290 is preferably configured to recognize the result on the server device 120 as a display content. Moreover, when the result of the re-recognition on the correcting unit 270 is the same as the result of the error section being recognized on the server device 120, the identification result is displayed without being displayed. [Preferred; and in this case] It can also show unrecognizable intentions. Furthermore, the possibility of recognizing the obtained identification result on the correction unit 270 and the identification result of the identification -27-200951940 obtained on the server device 120 is not desirable. Alternatively, if the length is one, it is judged that it is a character, and the method of structuring the device 110 is explained by the feature amount s 1 0 1 ). Then, (S102). The compression is performed (S103 is transmitted to the next section, and then transmitted to the server 120, and then based on the speech error interval, based on the S106). The correction unit 270 extracts (S107) based on the intermediate feature amount extracting unit and feeds (S 1 0 8 ). However, when the received text has an error in the time information, it is also displayed because of an error, or the meaning of the unrecognizable is displayed. It is more necessary to always perform the re-identification process, and the re-identification process may be performed according to whether the error area is performed. . For example, when the error interval is not subjected to the re-identification process, it is entered as a text or the like. The action of the client device 110. Figure 6 is a flow chart of the customer's operation. The speech calculation unit 210, which is input through the microphone, extracts the feature data (the feature amount storage unit 230 stores the feature amount, and the feature amount compression unit 22 sets the feature amount data). The compressed feature quantity data that has been compressed is sent to the server device 120 (S 104). The server device 120 performs voice recognition, and the result is recognized by the server, and is received by the receiving unit 23 (S105). q The result of the discrimination, the error section specifying unit 240 specifies the error section to which the error is specified, and specifies the context before and after the error (the error section included in the context), and the error section 260 sets the feature amount data from the feature amount holding unit 230. Based on the extracted feature quantity data, after the speech recognition is performed again by the line, the text data of the error interval is generated, and the text data of the error interval and the information of the receiving unit are integrated. After correct identification, -28- The text data of 200951940 is displayed on the display unit 290 (S109). Next, the processing in the above S106 to S108 will be described in detail. Fig. 7 is a flowchart showing the detailed processing. Referring to Fig. 5(a) as appropriate The error-period specifying unit 240 specifies an error section based on the recognition result (S201 (S106)). Based on the error section, the error-period context specifying unit 250 specifies the preceding vocabulary W1 of the error section (Fig. 5(a)) φ And (S202), by the error interval before and after the context specifying unit 250, the vocabulary W2 (Fig. 5(a)) of the error section is designated and memorized (S203). The error period before and after the context specifying unit 250 specifies the start time T1 of the vocabulary W1 (Fig. 5 (a)) (S204), and specifies the end time T2 of the vocabulary W2 (Fig. 5 (a)), and then saves them separately ( S205). The error amount interval obtained by adding the preceding and succeeding words (one identification unit) to the error interval, that is, the feature quantity data of the interval from the start time T 1 to the end time T2 , is the error interval feature quantity. The extraction unit 260 extracts (S206 (S107)). The setting of the constraint condition with the vocabulary W1 as the starting point and the vocabulary W2 as the end point is performed in the correction unit 270 (S207). Then, according to the constraint condition, the correction unit 270 performs the adjustment. In the identification process of the feature amount data, the correction process is executed (S208). As described above, the operation effect of the client device n 本 in the present embodiment will be described. In the client device 110, the feature amount calculation unit 210 calculates The feature amount data of the input voice, the feature amount compressing unit 22, transmits the feature amount data to the voice recognition device, that is, the server device 12〇-29-200951940. On the other hand, the feature amount storage unit 2 The system saves the feature amount data, and then performs identification processing on the server device 120. The receiving unit 23 receives the identification result from the server device 120. The error interval specifying unit 240 is configured to receive the identification. In the result, an error section in which a recognition error has occurred is specified. The error section specifying unit 240 can determine based on the reliability. Then, the error section feature quantity extracting unit 260 extracts the feature amount data of the error section and corrects it. The unit 270 performs a re-identification process on the identification result of the extracted error section to perform a correction process. That is, in the integration unit 280, the result of the re-recognition and the identification result received by the receiving unit 253 are integrated to perform the correction processing, and the display unit 290 displays the corrected The result of the identification. In this way, among the results of the identification, the necessary parts are corrected, and the error of the speech recognition can be easily corrected, and the correct identification result can be obtained. For example, you can cut the wrong vocabulary by up to 70%. In addition, errors caused by unknown words can be corrected by more than 60%. In addition, the reliability may be received from the server device 120 or may be calculated on the client device 110. In addition, the client device 1 1 0 can use the error interval before and after the context specifying unit 250 to perform the correction processing (re-identification processing) in accordance with the constraint condition. That is, the vocabulary before and after the error interval is fixed, and the identification process is performed according to the fixed vocabulary, so that the recognition result with better precision can be obtained. Further, in the present embodiment or other embodiments described later, although the first identification processing is performed on the server device 120, the present invention is not limited thereto, and the first identification processing may be performed at the client device 110.中进-30- 200951940 fr 'The second identification process is performed on the server device i2〇. At this time, it is assumed that the designation processing of the error section or the like is performed on the server device 120. For example, in this case, the client device 110 includes an identification processing unit for performing identification processing based on the feature amount data calculated by the feature amount calculation unit 210, and the transmission unit 225 sets the identification result here. The feature amount data is sent to the server device 120. The server device 120 includes an error section specifying unit 240 corresponding to the client device ι1, an error section context specifying unit 250, a feature amount holding unit 230, an error section feature amount extracting unit 260, and a correcting unit 270. Each part of the 'feature amount data transmitted from the client device 11' is stored in the feature amount storage unit, and specifies the error interval based on the recognition result and specifies the context before and after the error interval. Correction processing (identification processing) of the saved feature amount data. The processed identification result is sent to the client device 110. Further, in the other embodiments described in the present embodiment or the following, the re-identification (correction processing) is performed using the constraint conditions defined by the error interval before and after the context specifying unit 250: However, in the case of this example , is the feature quantity data using only the error interval. It is also possible to perform the re-identification process without using the constraint conditions as described above. Further, it is preferable to change the identification method on the server device 120 and the identification method in the present embodiment (or other embodiments described below). That is, on the server device 120, since it is necessary to recognize the voice of a non-specific user, it is necessary to have versatility. For example, the number of models and the number of dictionaries of the acoustic model holding unit, the speech model-31 - 200951940 type holding unit, and the dictionary holding unit used in the server device 120 are set to be large, and the number of phonemes in the acoustic model is set to More, the number of vocabulary words in the speech model is set larger, etc., and the number of models and the number of dictionaries are set to a larger capacity so that it can correspond to any user. On the other hand, the correction unit 270 on the client device 110 does not need to correspond to any user, and can use an acoustic model, a speech model, and a dictionary that conform to the voice of the user of the client device 110. Therefore, the client device 110 must use the text input processing of the correction processing, the identification processing, or the mail as the reference, and appropriately update each model, the dictionary, and the client device 110. The display unit 290 for displaying the identification result that has been corrected by the correction unit 270, the recognition result recognized on the server device 120 is not displayed on the display unit 290. In this way, the identification result due to the possibility of identifying errors will not be displayed 'and therefore will not cause misunderstanding to the user. Moreover, when the client device 110' recognizes the result of the recognition of ◎ on the correcting unit 270 and the recognition result received by the receiving unit 235 is the same, or the time information contained in each of the identification results is different, Then, the correction unit 270 determines that the recognition error has occurred, and the display unit 290 does not display the recognition result. Thereby, it is possible to prevent an erroneous recognition result from being displayed. Specifically, the error vocabulary can be reduced by up to 70%. In addition, the error caused by the unknown word can be revised to more than 60%. <Second Embodiment> -32- 200951940 Next, it is explained that the error section is not automatically determined based on the reliability, and the client device 11 〇a configured by the user is manually determined. Fig. 8 is a block diagram showing the function of the client device 110a in the error section by the user input. As shown in FIG. 8, the client device 11a includes: a feature amount calculation unit 210, a feature amount compression unit 220, a feature amount storage unit 230, a transmission unit 225, a reception unit 23, and an operation unit 23 6 The result storage unit 237, the user input detecting unit 238, the error section specifying unit 240a, the error area φ between the preceding and succeeding context specifying unit 250, the error section feature amount extracting unit 260, the correcting unit 270, the integration unit 280, and the acoustic model hold The unit 281, the speech model holding unit 282, the dictionary holding unit 283, and the display unit 290 are configured. The client device 110a is implemented by the hardware shown in Fig. 3 in the same manner as the client device 110. The client device 11A is different from the client device 110 in that it includes an operation unit 236, a result storage unit 23, a user input detection unit 23, and an error section designation unit 240a. The following is a description of the difference between the points Q. The operation unit 236 accepts a portion for user input. The user can specify the error interval while confirming the recognition result displayed on the display unit 290. The operation unit 236 can accept the designation. The result storage unit 23 7 is a portion for storing the speech recognition result received by the reception unit 25 5 . The saved speech recognition result is displayed on the display unit 290 in a visually readable manner by the user. The user input detecting unit 23 8 is for detecting the portion of the user input by the operation unit 23 6 , and outputs the error section that has been input, -33 - 200951940, to the error section specifying unit 240a. The error section specifying unit 240a specifies the section for the section in accordance with the error section input from the user input detecting section 238. Next, the processing of the client device 11A thus constructed will be described. Figure 9 is a flow diagram of the processing of client device 110a. The voice input through the microphone is extracted by the feature amount calculation unit 210 (S101). Then, the feature amount data is stored in the feature amount storage unit 230 (S102). Next, the feature amount data is compressed by the feature amount compressing unit 22 〇 #| (S103). The compressed feature quantity data that has been compressed is transmitted to the server device 12 by the signal transmitting unit 225 (S104). Then, speech recognition is performed on the server device 120, and the identification result transmitted from the server device 120 is received by the receiving unit 235, temporarily stored, and the identification result is displayed on the display unit 290 (S105a). Then, based on the identification result displayed on the display unit 290, the user judges the error section and inputs the error section. Then, the input is detected by the user input detecting unit 238, and the error section is specified by the error section specifying unit 240q. Then, the context before and after (SI 0 6a) is specified based on the specified error interval. Based on the error section including the context, the error section feature quantity extracting unit 2 60 extracts the feature amount data (S1 07), and performs the speech recognition again by the correction unit 2 70 to generate the text data of the error section ( S108). Then, the text data of the error section and the text data received by the receiving unit 253 are integrated, and the correct text data is displayed on the display unit 290 (S109). Next, the processing in the above S105a to S108 will be described in detail. Figure -34 - 200951940 10 is a flow chart showing the detailed processing when the error area is specified by the user input on the client device 11 〇a. The receiving unit 235 receives the identification result and displays it on the display unit 290 (S301). When the user confirms the recognition result displayed on the display unit 290, the error section is specified, and the user inputs the detection unit 238 to detect the start position of the error section and temporarily saves it (S302). Then, the error interval before and after the context specifying unit 250 specifies the preceding word Q of the error interval and stores it (S303), and the start time T1 of the saved vocabulary W1 is designated and stored (S 3 04). Further, the end position of the error section designated by the user is detected by the user input detecting unit 238 and temporarily stored (S3 05). Then, the error interval before and after the context specifying unit 250 specifies the vocabulary W2 after the error interval and stores it (S306), and the end time T2 of the saved vocabulary W2 is designated and stored (S3 07). After these processes, the feature measurement data from the start time T1 to the end time T2 is extracted by the error section feature amount extracting unit 260 (S308). The setting of the constraint condition with the vocabulary w 1 as the starting point and the vocabulary W2 as the end point is performed in the correcting unit 270 (S309). Then, in accordance with the restraining condition, the correcting unit 2 70 performs recognition processing of the feature amount data, and performs correction processing (S310). By doing so, the error interval can be specified by the user input, whereby the re-identification can be performed to perform the correction processing of the recognition result. In such a client device 11a, the display unit 290 displays the recognition result, and the user visually confirms it, and the user can specify the error interval by operating the operation -35-200951940, 236, that is, The revised location. Thereby, among the identification results, the necessary parts are corrected, and the correction processing can be easily performed, and at the same time, the correct identification result can be obtained. <Third Embodiment> Next, when the time information is not included in the identification result transmitted from the server device 120, the client device 110b in the error section can be correctly designated. Figure 11 is a block diagram of the functionality of the client device 110b. The client device 11b includes a feature amount calculation unit 210, a feature amount compression unit 220, a transmission unit 225' feature quantity storage unit 230, a reception unit 235, a time information calculation unit 239, and an error interval specification unit 240. The error section feature quantity extracting unit 260, the error section context specifying unit 250, the correcting unit 270, the acoustic model holding unit 281, the speech model holding unit 282, and the dictionary holding unit 283 are configured. The client device 11b is realized by the hardware shown in Fig. 3 in the same manner as the client device 110 of the first embodiment. Further, the difference from the client device 110 of the first embodiment is that the client device 110b receives the identification result that does not include the information from the server device 120, and then the time information calculation unit 239 The elapsed time (frame index) is automatically calculated based on the recognition result, that is, the text data. The client device 110b will be described below centering on the difference. The time information calculation unit 239 calculates the elapsed time portion of the character data by using the character data among the recognition results received by the reception unit 235 and the feature amount data stored in the feature amount storage unit 230. More specifically, the time information calculation unit 23 9 converts a word data or a recognition unit of the text data by comparing the input 200951940 character data and the feature amount data stored in the feature amount storage unit 230. When the frequency data is formed, it is determined which part of the feature quantity data is consistent, and thus the elapsed time of the text data can be calculated. For example, when the 10 frame portion of the feature amount data is identical to the word data of the text material, the word pool has an elapsed time of 10 frames. The error section specifying unit 240b can specify the error section using the elapsed time and the character data calculated by the time information calculating unit 0 239. The error section specifying unit 24〇b determines the error section based on the reliability information included in the identification result. Further, as in the second embodiment, the error section may be specified by the user input. In this way, based on the error section 'error section' specified by the error section specifying unit 240b, the error section feature context extracting unit 260 specifies the error section including the context before and after, and the error section feature quantity extracting unit 260 sets the voice data of the error section. After the extraction is performed, and then the correction unit 270 performs the re-recognition @ processing, the correction processing can be performed. Next, the processing of the client device 11b will be described. Figure 12 is a flow chart showing the processing of the client device 11b. The voice data input through the microphone is extracted by the feature amount calculation unit 210 (S1 01). Then, the feature amount storage unit 230 stores the feature amount information (S102). Next, the feature amount data is compressed by the feature amount compressing unit 22 0 (S103). The compressed feature quantity data that has been compressed is transmitted to the server device 120 by the transmission unit 225 (S104). Next, voice recognition is performed on the server device 120, and the identification result (excluding the elapsed time) is transmitted from the server-37-200951940 device 120, and is received by the receiving unit 235 (S105). Then, based on the speech recognition result and the feature amount data of the feature amount storage unit 230, the time information calculation unit 239 calculates the elapsed time, uses the elapsed time and the speech recognition result, and specifies the error section by the error section specifying unit 240. The context-based context specifying unit 250 specifies the context (SI 06b) based on the specified error section. Based on the error section including the context, the error section feature quantity extracting unit 260 extracts the feature amount data (S107), and performs the re-speech recognition by the correction unit 270 to generate the text data of the error section (S1 08) ). Then, the character data of the error section and the character data received by the receiving unit 253 are integrated, and the correct character data is displayed on the display unit 2S > 0 (S109). Next, a more detailed process including S106b will be described. Figure 13 is a flow chart showing the detailed processing of S 105 to S1 08. The reception unit 23 receives the identification result without the elapsed time (S4〇1), and calculates the elapsed time in the character data by the time information calculation unit 239 (S402). The error section specifying unit 240 specifies an error section (S 4 0 3 ) from the identification result. Based on the error section, the error section before and after the context specifying unit 250 specifies the preceding vocabulary W1 of the error section (Fig. 5 (3)) and stores it (S404). Further, the vocabulary W2 (Fig. 5(a)) following the error interval before and after the context specifying portion 25 0' error section is designated and stored (S405). Next, the error period before and after the context specifying unit 250 specifies the start time T1 of the vocabulary W1 (Fig. 5(a)) (S406), and specifies the end time T2 of the vocabulary W2 (Fig. 5 (a)) (S407 - 38) - 200951940 The feature amount data of the section in which the error interval obtained by adding the preceding and succeeding words to the error section, that is, the section from the start time τ 1 to the end time T2 is extracted by the error section feature quantity extracting section 260 (S408) The setting of the constraint condition with the vocabulary W1 as the starting point and the vocabulary W2 as the end point is performed in the correcting unit 270 (S4 09). Then, according to the constraint condition 'the correcting unit 270, the identification processing of the feature amount data is performed. And performing the correction processing (S410). According to the client device 11b, based on the identification result received by the receiving unit 253 and the feature quantity data stored in the feature quantity storage unit 203, The time information calculation unit 239 calculates the elapsed time of the recognition result. Then, the error section specifying unit 240 can specify the error section based on the time information. Here, the context is specified based on the specified error section. After the feature quantity data based on which to revise processing whereby, when the identification result does not include the time information can also be specified between ❹ The relevance error region. <Fourth Embodiment> Next, a client device 111c that performs correction processing based on the recognition result obtained by performing voice recognition on the server device 120 will be described. Figure 14 is a block diagram of the functionality of the client device 110c. The client device 11c includes a feature amount calculation unit 210, a feature amount compression unit 220, an error interval designation unit 240, an error interval context designation unit 25A, a correction unit 270a, and a speech DB holding unit 284. . The client device -39-200951940 1 1 0c is implemented in the same manner as the client device 1 1 藉 by the hardware shown in FIG. The client device 11〇c is compared with the client device 110, and the feature quantity data obtained by the voice input is not memorized, and the composition of the feature quantity data is no longer used. Specifically, the feature amount storage unit 230, the error section feature quantity extracting unit 260, the acoustic model holding unit 281, the speech model holding unit 282, and the dictionary holding unit 283 are different. In the following, the feature amount calculation unit 210 calculates the feature amount information 'feature amount compressing unit 220 based on the voice input, and compresses the feature amount data to the server device 120. Then, the receiving unit 23 receives the identification result from the server device 120. The error section specifying unit 240 specifies the error section by the reliability information or the user operation, and the error section before and after the context specifying unit 250 specifies the context before and after, and then specifies the error section. The correction unit 270a performs conversion processing based on the character data specified by the error section including the contexts, based on the database s stored in the speech DB holding unit 2 84. The speech DB holding unit 284 stores substantially the same information as the speech model holding unit 829, and stores the chain probability of each syllable. Then, the correction unit 270a lists the words w ( Wi, Wi + l ... Wj ) having the possibility of occurrence of an error interval. Here, the number of vocabulary columns w is also limited to K. The number of restrictions κ is set to be the same as the number of erroneous words, or a certain range close to Ρ (K = P-c to -40- 200951940 P + c ). Then, the correction unit 270a calculates a likelihood when all the listed word lists are defined as the preceding and succeeding words W1 and W2. In other words, the likelihood is obtained using the following formula (1) for all W sequences using the speech D B stored in the terminal. The likelihood of the word queue (W1 w W2) P ( wl w w2) = P ( Wl, Wi, Wi + l ... Wj , W2 ) = P ( W1 ) * P ( Wi / Wl ) * ? ( W2 / Wj ) ---(1) 0 Then calculate the distance between the word queue of the error interval and the candidate, and also add the distance. In this case, it becomes the calculation formula of the following formula (2). The likelihood of the word queue (W1 w W2 ) P ( wl w w2 ) = P ( Wl, Wi, Wi + l ··· Wj , W2 ) * P ( Wi, Wi + l · · Wj , Werror ) · · (2) P ( Wi, Wi + l ··· Wj, Werror ) indicates the distance between the error word queue Werror and the candidate column Wi, Wi+l...Wj. P (Wn/Wm) of this formula is a probability that W- appears after Wm in the N-gram model. Although 〇 is illustrated by the Bi-gram example, other N-gram models can be used. The integration unit 280 integrates the text data converted by the correction unit 270a and the character data in the received recognition result, and the display unit 290 merges and displays the corrected text data. Further, the candidates sorted by the likelihood calculated by the correcting unit 270a may be listed before the integration, and may be selected by the user, and the candidates with the highest likelihood may be automatically determined. Next, the processing of the client device 11c configured as described above will be described -41 - 200951940. Figure 15 is a flow chart showing the processing of the client device uoc. Based on the voice data input by the voice, the feature amount calculation unit 210 calculates the feature amount data, and the feature amount data compressed by the feature amount compressing unit 220 is transmitted to the server device 1 20 (S502). Then, the recognition result after the speech recognition is performed on the server device 120 is received by the reception unit 235 (S502), and the error section designation unit 240 specifies the error section (S503). Here, the designation of the error interval may be based on the reliability, or may be specified by the user input, and the context before and after the error interval specifies the context (word) of the error interval (S504). . Then, the re-conversion process is performed by the correction unit 270a, and the candidates for the error section are listed (S505). Here, the likelihood of each candidate is calculated by the correction unit 270a (S506), the sorting process is performed based on the likelihood (S507), and the sorted candidate group is displayed on the display unit 290 (S508). In the client device 111c, the feature amount calculation unit 210 calculates the feature amount data based on the input voice, and the feature amount compressing unit 220 compresses the feature amount, and the transmitting unit 225 transmits the feature amount data to the server device. 120. On the server device 120, voice recognition is performed, and the receiving unit 23 receives the identification result. Then, the error section specifying unit 240 performs the correction processing by the correcting unit 270a based on the error section specified by the error section before and after the context specifying unit 250. Then, after the integration processing by the integration unit 280, the display unit 290 displays the corrected recognition result. In this way, among the results of the identification, the necessary parts are corrected, and the error of the speech recognition can be corrected easily, and the correct identification result can be obtained. Further, in the present embodiment, the configuration of the feature amount data is not used because the feature amount data is not stored and the feature amount data is not used in the re-identification processing. Therefore, the configuration can be simplified. <Fifth Embodiment> Next, the description will be made of the first speech recognition and the second speech recognition on the client device 110d instead of the split processing of the voice recognition by the server device 120. Figure 16 is a block diagram of the functionality of the client device n〇d. The client device 110d includes a feature amount calculation unit 210, a first identification unit 226 (acquisition means), a speech model holding unit 227, a dictionary holding unit 22 8 , an acoustic model holding unit 229, a feature amount storage unit 230, and an error section. Designation unit 240, error section context specifying unit 25A, error section feature quantity extracting unit 260, correction unit 270, acoustic model holding unit 281, speech model holding unit 282, dictionary holding unit 283, integration unit 280, and display unit 290 constitutes. The client device 110d is implemented in the same manner as the client device 11 by the hardware shown in FIG. The client device liod is configured to be in communication with the server device 120 in the client device 11' of the first embodiment, and includes a first identification unit 226, a speech model holding unit 227, and a dictionary hold. The portion 228 and the acoustic model holding unit 229 differ. The following is a description of the difference. The first identification unit 226 performs speech recognition using the speech model holding unit 227, the dictionary guarantee, and the acoustic model holding unit 229 for the -43-200951940 feature quantity data calculated by the feature amount calculation unit 210. The speech model holding unit 227 is a part for storing statistical information indicating vocabulary, characters, and the like. Word 228, which holds the phoneme and text database, is part of the HMM (Hidden Marcov Model). The sound unit 229 registers the phoneme and its spectrum. The error section specifying unit 240 inputs the identification result recognized in the error area 2 40 described above and designates it. The error section feature quantity extracting unit 260 specifies an error section pulse, and the error section feature quantity extracting unit 260 extracts the feature amount data including the contexts before and after. Then, the correction unit 270 collects the data to perform the re-identification processing. The correction unit 270 functions as a second identification unit. Then, once the integration processing is performed by the integration unit 280, the display unit 290 can display the identification result that has been corrected. Next, the operation of the client device 110d will be described. A flowchart of the processing of the terminal device 11 0d. The feature amount data of the input voice is calculated by the feature amount calculation (S601), and the feature amount data is stored in the feature amount storage unit 230 (in parallel with the save process, the first recognition unit 226) The recognition result holding unit 228 that has been voice-recognized by the first identification unit 226 stores the error rate of the error interval of the inter-data designation unit of the model retention memory based on the special system. After the first step, the S7 is calculated based on the client unit 210. The type of speech recognition error area 200951940 is specified by the error section specifying unit 240 and the error interval context specifying unit 250 (S604). The feature amount data of the designated error section (including the contexts) is extracted from the feature amount storage unit 230 by the error section feature amount extracting unit 260 (S605). Then, the voice of the error section is re-identified by the correction section 270 (S606). Here, the recognized recognition result is integrated by the integration unit 280, and the recognition result is displayed by the display unit 290 (S607). In this manner, in the client device 110d, the identification processing is performed by the first identification unit 226 and the second identification unit (correction unit) 270, so that accurate speech recognition can be performed. Further, the first identification unit 226 and the second identification unit adopt different identification methods, which is preferable. Thereby, the voice that is not recognized by the first recognition unit 226 can still be remedied by the second recognition unit 270, and the result of correct speech recognition can be expected as a whole. According to the client device 110d, the feature amount calculation unit 210 calculates the feature amount data based on the input voice, and stores it in the feature amount storage unit 230φ. On the other hand, the first identification unit 226 performs speech recognition processing based on the feature amount data, and the error section specifying unit 240 and the error section before and after the context specifying unit 250 specify that a recognition error has occurred in the recognized identification result. Error interval. Then, the correction unit 270 (second identification unit)' corrects the identification result of the designated error section. Thereby, among the results of the identification, the necessary portions are corrected, and the correction processing can be easily performed. At the same time, a correct identification result can be obtained. Further, by performing the second-degree identification processing in the client device 110d, it is not necessary to use the server device 120. -45 - 200951940 <Sixth Embodiment> Next, a sixth embodiment which is a modification of the second embodiment will be described. According to this embodiment, it is possible to automatically judge that the end point of the error section has such a feature. Fig. 18 is a block diagram showing the functional configuration of the client device 11 Of of the sixth embodiment. The client device 11 Of includes a feature amount calculation unit 210, a feature amount compression unit 220, a feature amount storage unit 23 0, a transmission unit 225, a reception unit 23 5 , an operation unit 23 6 , and a result storage unit 23 7 . The user input detecting unit 238, the error section specifying unit 240c, the end point determining unit 241, the error section before and after mode specifying unit 250, the error section feature amount extracting unit 260, the correcting unit 2 70, the integration unit 280, the acoustic model holding unit 281, The speech model holding unit 282, the dictionary holding unit 283, and the display unit 290 are configured. The client device 11f is implemented by the hardware shown in Fig. 3 in the same manner as the client device 110. The client device 1 l〇f is only the starting point of the error section in the error section specifying unit 240c, and the endpoint determining unit 24 1 determines the end point of the error section based on the predetermined condition, which is different from the second embodiment. . The block diagram shown in Fig. 18 will be described focusing on the differences from the second embodiment. Similarly to the configuration shown in the second embodiment, the client device il〇f' receives the identification result recognized by the server device 120 and receives it by the receiving unit 235. The result storage unit 237 stores the identification result. . Then, the identification result is displayed on the display unit 290, and the user operates the operation unit 236' while viewing the recognition result displayed on the display unit 290 on one side 200951940 to specify the starting point of the error section. The user input detecting unit 23 8 detects the starting point and outputs it to the error section specifying unit 24〇c. The error section specifying unit 240c specifies the error section in accordance with the end point determined by the start point and the end point determining unit 241 designated by the user. When the end of the error section is judged, the error section specifying unit 2400c detects that the starting point has been designated from the user, and outputs the intention to the destination determining unit 241 to indicate the end point. The end point judging unit 24 1 automatically determines the end point of the error section in accordance with the instruction ' from the error section specifying unit 24〇c'. For example, the end point determination unit 241' compares the reliability information included in the speech recognition result received in the reception unit 23 and stored in the result storage unit 237 with a preset threshold, and the reliability is exceeded. The threshold vocabulary (or the most trusted vocabulary) is judged as the wrong end point. Then, the end point determining unit 241 outputs the determined end point to the error section specifying unit 240c, and the error section specifying unit 240c can specify the error section. For example, the following speech is taken as an example. In addition, for convenience of explanation, it is assumed that the word "activation" is designated as the starting point of the error interval. <sounding content> "Bei is the target 奁 汔吣匕 ii, all are necessary τ 言." (kono mokuhyo\i wo tassei suru tame ni wa,mina san no kyouryouku ga hituyou desu ° ) -47- 200951940 Chinese translation: "In order to achieve this goal, we need your cooperation." <Voice recognition result> B. Target 奁 activation ο /;: 沁丨;: Over, all $9 0 coercion necessary τ t. "a, mina san no (kono mokuhyou wo kasseika no tame ni kyouryouku ga hituyou desu 〇) Chinese translation: "In order to activate this goal, you need your cooperation." Here, the speech recognition result is cut into vocabulary units. Look. In addition, " / " means the division of the vocabulary. ": blade / target / warehouse / activation / 0 / wide: 吣 / 匕 / (±, / are / $ people / is / synergy / force W necessary / T f.) (kono / mokuhy〇u / wo / kasseika /no/tame/ni/wa, /mina/san/no/ky〇uryouku/ ga/hituyou/desu ° ) Chinese translation: "For / activation / this / target /, / need / everyone / / synergy / As a result of this speech recognition, the reliability of "kasseika" is 0.1, ❼(no)" is 頼.〇1, (tame)" is 0.4, "( ni)" In the case where the reliability is 0.6, if the threshold is set to 0.5, it can be judged that "{;: (ni)" in the activation /7/汔沁/(B (kasseika/no/tame/ni)" is In addition, the end point determination unit 24 1 may determine the previous one of the vocabulary whose reliability is equal to or higher than the threshold (the above example is "tame"), but the designation of the error interval is performed. As long as the result is included in the error part, any method can be adopted. The method of specifying such an error interval is based on the user's usual -48-200951940 revision habit, so it is very For use, that is, for example, in the case of Chinese character conversion, when the user specifies the wrong situation, 'first enter the starting point, then delete the error, and then enter the correct vocabulary column' is a commonly used convention. The above-mentioned error interval specification method is also When the starting point is input, the end point is automatically determined. Therefore, the user can operate without any abnormality in accordance with the operation method. Further, the end point determining unit 2 41 is not limited to the determination of the end point. u The above method may be, for example, a method of judging an end point according to a specific pronunciation symbol, or a method of treating an n-th vocabulary as an end point after the start of the error start point. Here, the method according to the pronunciation symbol is The method of judging based on the pause in the utterance may also be judged based on a short pause (comma) appearing at the boundary of the sentence and a long pause (period) appearing at the end of the utterance. If the judgment is made, a more accurate voice recognition can be expected. The specific example will be described below. The voice is the same as the above. An example to illustrate. <sounding content> "2nd goal 奁 achieves t* δ /C吣t 丨i, all $ 九〇 synergy 妒 Τ 玄." (kono mokuhyou wo tassei suru tame ni wa,mina san no kyouryouku ga hituyou Desu ° ) Chinese translation: "In order to achieve this goal, you need your cooperation. <Voice Recognition Results>

「乙0目標奁活性化0亡沁(C 、皆$九刃協力汾必要T -49- 200951940 r。」 (kono mokuhyou wo kasseika no tame ni wa, mina san no kyouryouku ga hituyou desu ° ). 中譯:「爲了活性化此目標,需要各位的協力。」 使用者係藉由操作著操作部236,將“ C 0目標奁( kono mokuhyou wo ) ”之後設定爲錯誤區間之起點,則終 點判斷部24 1係將最靠近該部分的停頓(逗點部分),判 斷爲終點。錯誤區間指定部240c,係可基於該終點來指定 錯誤區間。上述的例子中,作爲錯誤區間之終點,係指定 了 “ fz吣IZ辻、(tame niwa,)”中的“、”之部分。此 外,“、”的部分實際上並非語音,而是一瞬間有停頓之 狀態。 此外,作爲特定的發音係除了逗點、句點以外,亦可 爲“无〜(e-) ” 、 “态0〜(ano-) ”這類發語詞發音 ,或是“宏才(masu) ” 、 “TT (desu) ”這類結尾詞 字彙。 接著,例示將錯誤起點平移的第Μ個字彙視爲終點的 方法之例子。以下所示的文章,係爲區分成字彙單位之狀 態。此外,“厂係表示字彙的區隔。 「二© /目標/奁/活性化/❼/亡吣/ (C /试、/皆/ $九/ 0 /協力/ 洳/必要/ 7才。」 (kono/mokuhy〇u/wo/kasseika/no/tame/ni/wa,/mina/san/no/ky oury ouku/ ga/hituyou/desu ° ) 中譯:「爲了 /活性化/此/目標/,/需要/各位/的/協力/。」 200951940 例如’將起點設爲“活性化(kasseika),,時’且 M = 3的情況下,“活性化/⑦/亡沁(kasseika/n〇/tame) ” 中的汔吣(tame ) ”就成爲終點的字彙。因此,錯誤區 間指定部24〇C ,係可將“活性化/刃/亡吣( kasseika/no/tame ),指定爲錯誤區間。此外,當然 Μ = 3以外也行。 接著’說明將辨識結果的候補數(衝突數)較少的字 〇 彙設爲終點的方法之例子。例如,用以下的例子來說明。 在 *-(^>/目標/仓/活性化/〇/大;:^)((]<;011〇/111〇]<:1111>^011/评〇/ kasseika/no/tame)」中,可舉除以下的候補。 「活性化(kasseika )」:“汔打(dare ) ” 、 “沢山( takusan) ” 、 “朽勧沁(osusume) ” 「Φ(ηο)」· 力、(ka) ' " 55¾ (aru) 「/ϊ吣(t a m e )」:-(無候補) 作爲參考,中文的發音及其候補例例示如下。 Q 北海道:柔道拜見別的 如期:突起路基提起體積 舉行:舉行 該候補的數目,係反映出該區間的模糊性,信賴性越 低則會有越多的候補會被從伺服器裝置120發送過來。此 外,在此例中係被構成爲,於伺服器裝置120上,不會發 送信賴度資訊,改成將基於信賴度資訊所得到之其他候補 ,直接發送至客戶端裝置110。 此情況下,關於「汔吣(tame )」,由於沒有候補, -51 - 200951940 所以可想成它的信賴度就是那麼高。因此,在此例中’作 爲錯誤區間就可將其前面的“ ® (no) ”判斷爲錯誤區間 之終點。此外’作爲錯誤區間之終點’並不限定於其緊臨 的前方,亦可帶有某種程度的幅度。 如以上所述’可考量將終點地點以基於信賴度之方 '法 、利用特定發音記號(或發音)之方法、將起點至第]^個 字彙視爲錯誤區間之方法’但亦可將這些方法加以組合’ 亦即,將這些複數方法的訂正結果,作爲N-be st的形式或 從複數方法的辨識結果中選擇初一者的形式。此情況下’ 亦可依照辨識結果的分數高低順序而將辨識結果予以清單 顯示,讓使用者從該清單中選擇任意的辨識結果。 如此,基於錯誤區間指定部240c所指定的錯誤區間 ,錯誤區間前後文脈指定部2 5 0會指定包含其前後之區間 ,錯誤區間特徵量抽出部260係將其特徵量資料從特徵量 保存部230中予以抽出’訂正部270係對該特徵量資料進 行再辨識處理,以進行訂正處理。 接著’說明如此所被構成的客戶端裝置ll〇f之動作 。圖19係客戶端裝置110f之處理的流程圖。 透過麥克風所被輸入之語音,係藉由特徵量算出部 210而將其特徵資料予以抽出(si〇l)。然後,在特徵量 保存部230中係保存有特徵量資料(sl〇2)。接著,藉由 特徵量壓縮部220將特徵量資料進行壓縮(sl〇3)。已被 壓縮的壓縮特徵量資料’係被送訊部225發送至伺服器裝 置 120 ( S104 )。 200951940 接著,於伺服器裝置1 20上進行語音辨識,從伺服器 裝置120發送辨識結果,被收訊部235所接收,被暫時保 存,同時該辨識結果係被顯示在顯示部290 (S105a)。然 後,使用者係基於顯示部2 90上所顯示的辨識結果,來判 斷錯誤區間之起點,該起點,藉由操作一操作部23 6來加 以指定。然後,一旦藉由使用者輸入偵測部23 8而偵測出 起點已被指定之事實,則藉由終點判斷部24 1就會自動地 φ 判斷錯誤區間之終點。例如,基於語音辨識結果中所含信 賴度來進行判斷,或是將預定的發音記號所出現之地點, 判斷爲終點,然後從起點起第Μ個(Μ係預先定好的任意 値)判斷爲終點。 然後藉由錯誤區間指定部240c而如此指定出起點及 終點。然後,基於該已被指定之錯誤區間,來指定前後文 脈(S106C )。基於將該前後文脈予以包含的錯誤區間, 錯誤區間特徵量抽出部260會將特徵量資料予以抽出( φ S107 ),藉由訂正部270而進行再度語音辨識,生成錯誤 區間的文字資料(S 1 08 )。然後,錯誤區間的文字資料、 和收訊部23 5上所接收到的文字資料會進行統合,正確的 文字資料,會被顯示在顯示部290上(S109)。 此外,含有S106c的S105a〜108之處理,係和圖10 所示的流程圖進行大致相同之處理,但關於S3 05之處理 ,終點判斷部24 1係自動地判斷錯誤區間之終點地點並將 其保存這點,有所不同。 如以上所述,若依據此實施形態,則此種錯誤區間之 -53- 200951940 指定方法,係可依循使用者平常的訂正習慣,可提供非常 便於使用的裝置。 <第7實施形態> 接著,說明第7實施形態。若依據本實施形態,則錯 誤區間中,使用者指定開頭之文字,藉此就可以該指定之 文字作爲拘束條件來使其進行較正確的語音辨識。 圖20係第7實施形態的客戶端裝置11 〇g之機能構成 的區塊圖。客戶端裝置110g,係含有:特徵量算出部210 、特徵量壓縮部220、特徵量保存部230、送訊部225、收 訊部235、操作部236、結果保存部237、使用者輸入偵測 部23 8、錯誤區間指定部240a、錯誤區間前後文脈指定部 2 5 0a、錯誤區間特徵量抽出部260、訂正部270、統合部 280、音響模型保持部281、言語模型保持部282、字典保 持部283、顯示部290所構成。該客戶端裝置1 10g,係和 客戶端裝置110同樣地藉由圖3所示的硬體所實現。 該客戶端裝置UOg,係由操作部236從使用者接受錯 誤區間中的訂正後之文字來作爲拘束條件,錯誤區間前後 文脈指定部250a會在錯誤區間前後指定文脈,和操作部 23 6上所受理到的訂正後之文字,訂正部270係將這些錯 誤區間前後文脈與訂正後之文字視爲拘束條件,來進行再 辨識處理,以進行訂正處理,這點是具有特徵。 亦即,操作部236,係從使用者受理用來指定錯誤區 間之輸入,其後,會受理錯誤區間中的訂正後之文字輸入 -54- 200951940 錯誤區間前後文脈指定部250a,係進行和上述 施形態中的錯誤區間前後文脈指定部250大致相同 ,於錯誤區間之前後,指定已被辨識之字彙(一辨 ),並且將已於操作部2 3 6上所受理到的訂正後之 予以指定。 訂正部270係基於,已於錯誤區間特徵量抽出 中所被抽出之特徵量資料及已於錯誤區間前後文脈 2 5 0a中所被指定之拘束條件,來進行再辨識處理, 行訂正處理。 例如,基於以下的例子來說明上述處理。 <發聲內容> 「乙®目標奁達成才吣ί二过、皆協力私 1"。」 、 (kono mokuhyou wo tassei suru tame ni wa, mina kyouryouku ga hituyou desu ° ) 中譯:「爲了達成此目標,需要各位的協力。」 <語音辨識結果> 「乙①目標奁活性化刃亡吣丨;:辻、皆$九0協力妒 t。」 (kono mokuhyou wo kasseika no tame ni wa, mina kyouryouku ga hituyou desu ° ) 中譯:「爲了活性化此目標,需要各位的協力。」 這種情況下,使用者係藉由操作著操作部236,在 第1實 之處理 識單位 文字, 部260 指定部 就可執 必要 san no 必要7 san no 錯誤區 -55- 200951940 間的起點(上述例子中係爲“二0目標奁(kono mokuhyou wo)”的下—位置),輸入正確的文字內容。 應輸入之平假名列係爲“七〇甘!^才态尨吣匕(tassei suru tame ni ) 。以下的例子,係以將輸入開頭之一部分 的“亡(ta ) ’’加以輸入之情形爲例來說明。此外,假設 錯誤區間之起點與終點,係藉由和上述同樣之方法而已決 定或將被決定。 一旦使用者透過了操作部23 6而輸入了 “亡(ta) ” ’則錯誤區間前後文脈指定部25 0a係將作爲前後文脈的 目標!(kono mokuhyou wo) ” ,和作爲輸入文字 的“ t ( ta ) ” ,視爲拘束條件;亦即,將“乙φ目標奁 亡(kono mokuhyou wo ta ) ” ,視爲在辨識特徵量資料之 際的拘束條件,而加以設定。 如此,將使用菩的文字輸入內容視爲拘束條件而進行 再度語音辨識的辨識結果,提示給使用者,就可提示較正 確的辨識結果。此外,訂正方法,係亦可除了語音辨識以 外,倂用按鍵文字輸入方法。例如,作爲按鍵文字輸入方 法係可考量日文假名漢字變換。在曰文假名漢字變換中係 具有,將輸入文字內容與字典進行比較,並預測其變換結 果之機能。例如,一旦輸入了“亡(ta ) ” ,則從資料庫 中依序列出以“亡(ta ) ”爲首的字彙,提示給使用者。 此處,亦可利用這種機能,將假名漢字變換的資料庫 之候補與語音辨識所得之候補予以清單顯示,基於這些清 單,使用者可選擇任意之候補。清單顯示的順序,係可依 -56- 200951940 照被賦予至變換結果或辨識結果之分數順序來爲之,亦 將基於假名漢字變換之候補與語音辨識所致之候補進行 較,關於完全一致或部分一致的候補,係亦可將各自所 賦予的分數進行核計,基於該分數來排序。例如,假名 字變換的候補A1 “達成(tassei) ”的分數是50,語音 識結果的候補B1 “達成玄§ ( tassei suru) ”的分數是 的情況下,由於候補A1與候補B1是有部分一致,因此 φ 可對各分數,乘算所定之係數,基於核計所得之分數來 行顯示。此外,完全一致的情況下,則不需要進行乘算 數之類的調整處理。又,亦可在使用者已選擇了假名漢 變換的候補A1 “達成(tassei ) ”的階段下,以“ <1 0 標奁達成(kono mokuhyou wo tassei) ”爲拘束條件, 尙未被確定的剩下之( suru ) ”所相當之特徵量 料,進行再度辨識,來顯示候補清單。 接著’說明如此所被構成的客戶端裝置110g之動 〇 。圖21係客戶端裝置ll〇g之處理的流程圖。 透過麥克風所被輸入之語音,係藉由特徵量算出 210而將其特徵資料予以抽出(S101)。然後,在特徵 保存部230中係保存有特徵量資料(S1〇2)。接著,藉 特徵量壓縮部220將特徵量資料進行壓縮(S103)。已 壓縮的壓縮特徵量資料,係被送訊部22 5發送至伺服器 置 120 ( S104 ) 〇 接著,於伺服器裝置120上進行語音辨識,從伺服 裝置120發送辨識結果,被收訊部235所接收,被暫時 可 比 被 漢 辨 80 亦 進 係 字 巨 將 資 作 部 量 由 被 裝 器 保 -57- 200951940 存,同時該辨識結果係被顯示在顯示部290 (S105a)。然 後,使用者係基於顯示部290上所顯示的辨識結果,來指 定錯誤區間(S106d )。然後,使用者係對操作部236, 進行用來訂正錯誤區間之辨識結果所需之文字輸入。在操 作部236上,一旦接受了文字輸入,便對錯誤區間前後文 脈指定部250a進行輸出,錯誤區間前後文脈指定部250a 係基於所被輸入之文字,還有該已被指定之錯誤區間,來 指定前後文脈。基於將該前後文脈予以包含的錯誤區間, q 錯誤區間特徵量抽出部2 60會將特徵量資料予以抽出( S107),藉由訂正部2 70而進行再度語音辨識,生成錯誤 區間的文字資料(S108)。然後,錯誤區間的文字資料、 和收訊部23 5上所接收到的文字資料會進行統合,正確的 文字資料,會被顯示在顯示部290上(S109)。 此外,關於含有S106d之S105a〜108之處理,係進 行和圖1〇所示的流程圖進行大致相同之處理。甚至’於 本實施形態中,除了圖1 〇的流程圖中的各處理外’還必 ❹ 須要加上,於S309中,將操作部236上所受理到的文字 當成拘束條件而加以設定之處理。此外’在到達S309以 前,必須要完成拘束條件的文字輸入受理。 如以上所述,若依據此實施形態,則作爲拘束條件是 除了前後文脈還設定了從使用者所指定的文字’藉此就可 進行較正確的語音辨識。 <第8實施形態> -58- 200951940 接著,說明第8實施形態。若依據本實施形態,則於 訂正部2 70上再辨識後之結果,會是不同於再辨識前之辨 識結果的辨識結果。 圖22係第8實施形態的客戶端裝置π 〇h之機能構成 的區塊圖。客戶端裝置11 Oh,係含有:特徵量算出部210 、特徵量壓縮部220、特徵量保存部230、送訊部225、收 訊部235、操作部236、結果保存部237、使用者輸入偵測 φ 部238、錯誤區間指定部240a、錯誤區間前後文脈指定部 250、錯誤區間特徵量抽出部260、訂正部270、統合部 280、音響模型保持部281、言語模型保持部282、字典保 持部283、顯示部290所構成。該客戶端裝置11 〇h,係和 客戶端裝置110同樣地藉由圖3所示的硬體所實現。以下 ’就與圖2中的客戶端裝置11〇之相異點爲中心來說明。 訂正部270b,係和圖3中的訂正部270同樣地是進行 再辨識處理等用之部分。然後,訂正部270b,係基於結果 e 保存部237中所記憶之辨識結果,以使得同樣辨識錯誤不 再發生的方式,進行再辨識處理。亦即,訂正部2 7 0b係 與已於錯誤區間指定部240a中所指定之錯誤區間中的辨 識結果進行比較,爲了使其不得到相同的辨識結果,於再 辨識的探索過程中,將包含錯誤區間中之辨識結果的路徑 ’從候補中予以排除在外,進行如此處理。作爲除外處理 ’訂正部270b係對錯誤區間的特徵量資料,乘算所定之 係數’以使候補中的假說的機率設成極小化,藉此,結果 而言,就不會去選擇極小的候補。此外,上述方法中,雖 -59- 200951940 然是將再辨識時有發生錯誤之可能性的候補(例如“活性 化”),從辨識結果候補中除外,但並不限定於此,亦可 於再辨識辨識結果提示之際,不顯示出有錯誤可能性之辨 識結果的一候補(例如“活性化”)。 此外,該客戶端裝置1 1 Oh,係執行和圖8所示流程圖 大致相同之處理。此外,關於S108中的錯誤區間之辨識 處理,係爲了不顯示同樣的辨識結果,而進行從候補中予 以除外的辨識處理,這點有所不同。 如以上所述,由於訂正對象的字彙有錯誤,因此不應 該對再辨識後的結果,輸出已經是訂正對象的字彙,因此 在本實施形態中,可使此種訂正結果不被顯示。 <第9實施形態> 接著,說明第9實施形態。若依據此實施形態,則於 錯誤區間特徵量抽出部260上所抽出的特徵量資料的錯誤 區間中,算出平均値,使用特徵量資料減去該平均値後的 資料來進行再辨識處理。 關於該具體構成,進行說明。圖23係第9實施形態 的客戶端裝置ll〇i之機能的區塊圖。該客戶端裝置ll〇i ,係含有:特徵量算出部210、特徵量壓縮部220、特徵 量保存部230、送訊部225、收訊部235、錯誤區間指定部 240、錯誤區間前後文脈指定部250、錯誤區間特徵量抽出 部260、平均値計算部261(算出手段)、特徵正規化部 2 62 (訂正手段)、訂正部270 (訂正手段)、統合部280 200951940 、音響模型保持部281、言語模型保持部2 82、字 部283、顯示部290所構成。該客戶端裝置il〇i, 戶端裝置11〇同樣地藉由圖3所示的硬體所實現。 就與圖2中的客戶端裝置110之相異點、亦即平均 部261及特徵正規化部262爲中心來說明。 平均値計算部261,係用來算出,已於錯誤區 量抽出部260上所被抽出之特徵量資料中的錯誤區 φ 均値(或含錯誤區間前後之平均値)用的部分。更 言,平均値計算部261係將錯誤區間中的各辨識單 —頻率的輸出値(大小),予以累積加算。然後, 加算所得到之輸出値,除以其辨識單位數,以算出 。例如,“活性化 / 乃 / 約(kasseika/no/tame ) ” 區間中的辨識單位,係爲被斜線“ /”所區隔的部 —辨識單位亦即辨識框架η,係由頻率f n 1〜f n 1 2 ,若假設其輸出値爲gnl〜gnl2,則頻率fl的平均 φ gl=Egnl/n (上述的例子中,n=l至3)來表示。 亦即,假設構成“活性化(kasseika ) ”的頻写 fl 12 (輸出値爲 gl 1 〜gl 12 ),構成“ ¢9 ( no ) ” f21〜f212 (輸出値爲g21〜g212 ),構成“尨吣( ”的頻率f3 1〜f312 (輸出値爲g3 1〜g3 12 )的情況 率fl的平均値係可藉由(gll+g21+g31) /3而算出 特徵正規化部262,係進行減算處理,將已於 計算部261上所算出之各頻率的平均値,從由各頻 成之特徵量資料中予以減去。然後,訂正部270係 典保持 係和客 以下, 値計算 間特徵 間之平 具體而 位之每 將累積 平均値 的錯誤 分。每 所構成 値可用 【fl 1〜 的頻率 tame ) 下,頻 〇 平均値 率所構 可對減 -61 - 200951940 算所得到之資料’進行再辨識處理,以進行訂正處理。 在本實施形態中’藉由使用已於平均値計算部261上 所算出之平均値來修正特徵量資料,就可獲得將例如對特 徵量算出部210輸入語音所需之麥克風等收音裝置之特性 加以去除後的資料。亦即’可去除麥克風收音時的雜訊, 可對較正確的語音’進行訂正(辨識處理)。此外,於上 述實施形態中,雖然對於已在錯誤區間特徵量抽出部260 上所抽出之錯誤區間進行適用,但亦可利用包含該錯誤區 間之一定長度之區間的特徵量資料。 又’上述平均値計算部261及特徵正規化部262,係 可對上述第2實施形態至第8實施形態分別適用。 <第1 〇實施形態> 上述第1實施形態至第9實施形態中所記載之語音辨 識結果訂正裝置亦即客戶端裝置110〜110i中,雖然是由 訂正部270來進行訂正處理(再辨識處理),但並非限定 於此。亦即,亦可藉由構成爲,將錯誤區間指定部24 0所 指定之錯誤區間,通知給伺服器裝置120,於伺服器裝置 120上進行再度訂正處理,由收訊部23 5來接收其訂正結 果之構成。伺服器裝置120上的再訂正處理係設計爲上述 的客戶端裝置11〇的訂正部270中的訂正處理。作爲客戶 端裝置110中的通知處理之具體例,係可考慮,將已被錯 誤區間指定部240上所被指定之錯誤區間的時間資訊、或 包含其前後之字彙的時間資訊,由錯誤區間指定部240來 200951940 計算之’由送訊部225將該時間資訊通知給伺服器裝置 120。於伺服器裝置12〇上,藉由進行異於最初進行過之 辨識處理的語音辨識裝置,以防止再度進行有誤的辨識。 例如’改變音響模型、言語模型、字典來進行辨識處理。 <第1 1實施形態> 接著’說明第11實施形態的客戶端裝置ll〇k。該第 0 1 1實施形態中的客戶端裝置1 1 〇k,係辨識詞根區間,使 用該當詞根區間中所被描述的詞根字串,來進行訂正處理 。圖26係該當客戶端裝置11 Ok之機能的區塊圖。 該客戶端裝置110k,係含有:特徵量算出部210、特 徵量壓縮部220、送訊部225、特徵量保存部230、收訊部 235、錯誤區間指定部240、詞根區間指定部242、分割部 243、錯誤區間特徵量抽出部260、字典追加部265、訂正 部270、統合部280、音響模型保持部281、言語模型保持 φ 部282、字典保持部283及顯示部290所構成。 與第1實施形態,在含有詞根區間指定部242、分割 部243、及字典追加部265這點,是有所不同。以下就該 相異點爲中心來說明其構成。 詞根區間指定部242,係從已於錯誤區間指定部240 上所被指定之錯誤區間中,指定出含有詞根字串之區間用 的部分。對於詞根字串,作爲其屬性資訊,係附加有表示 其係未知詞的“subword”之意旨,詞根區間指定部242 係可基於其屬性資訊來指定詞根區間。 -63- 200951940 例如,在圖28 ( a)中係圖示了,於伺服器裝置120 上,基於發話內容所辨識成的辨識結果。若依照圖28 (a ),貝(1 對“寸 七 y (sanyoumusen)” 附加“ subword”來作爲屬性資訊,詞根區間指定部242係可基 於該屬性資訊而將“寸厶七y ( sanyoumusen ) ” 辨識成爲詞根字串,將該字串部分指定成爲詞根區間。 此外,於圖28(a)中,依照發話內容而對已被辨識 之辨識結果的辨識單位,附加了框架索引。和上述同樣地 0 ,1框架係爲10msec程度。又,於圖28(a)中,錯誤區 間指定部240係可依照和上述同樣的處理,來指定錯誤區 間,可將“ T呔(de wa) ” (第2個辨識單位)至“妒( ga ) ” (第8個辨識單位),指定爲錯誤區間。在圖28 ( b)中係圖示了中文的發音例子作爲參考。 分割部243,係將詞根區間指定部242所指定之詞根 區間中所含有的詞根字串視爲交界,將已被錯誤區間指定 部240所指定之錯誤區間加以分割用的部分。若以圖28 ( Q a )所示的例子爲基礎,則基於詞根字串亦即“寸V 3々 厶七y (sanyoumusen) ” ,而分割成區間1和區間2。亦 即,第2個辨識單位的“ 7 (i ( dewa ) ”至第5個辨識單 位的“寸y彐々厶七( sanyoumusen ) ” ,亦即,以框 架索引而言的l〇〇msec至500msec,是被分割成區間1, 第 5個辨識單位的“寸(sanyoumusen) 至第8個辨識單位的“识(ga ) ” ,亦即3 00msec至 660msec,是被分割成區間2。 -64- 200951940 字典追加部265,係將已被詞根區間指定部242所指 定之詞根字串’追加至字典保持部283用的部分。在圖28 (a)的例子中,是將“甘厶七V ( sanyoumusen ) ”當成一個新的字彙而追加至字典保持部2 83 °又’對該 字典保持部283追加詞根的讀音,並且對言語模型保持部 2 82中,追加詞根與其他字彙的連接機率。言語模型保持 部2 82中的連接機率的値,係利用事前準備的詞根專用的 H 級別(class )即可。又,詞根模型的字串’因爲幾乎都是 專有名詞,所以利用了名詞(專有名詞)之級別的値即可 〇 藉由如此構成,錯誤區間特徵量抽出部260,係依照 被分割部243所分割而得到的區間1及區間2 ’而將特徵 量保存部23 0中所保持的特徵量資料,予以抽出。然後, 訂正部270係對各個區間所對應之特徵量資料,進行再辨 識處理,以執行訂正處理。具體而言,若爲圖28(a)之 Q 例子,則區間1的訂正結果係爲“ τ辻電気^ 一力⑦寸> 彐々厶七:X (dewa denki me-ka no sanyoumusen) ” ,區 間2的訂正結果係爲“寸> 3々厶七V Φ製品(i評判识( sanyoumusen no seihin wa hyouban ga ) ” 0 統合部280,係基於以被訂正部270訂正所得到之辨 識結果(區間1及區間2)爲交界的詞根字串來進行統合 處理,並且與已於收訊部235上所接收到的辨識結果進行 統合,令其顯示在顯示部290。若以圖28 ( a)爲例,則 統合後的結果,最終錯誤區間的文字就會成爲“ T丨±電気 -65- 200951940 /一力刃寸 >彐夕厶七>0製品(i評判识(dewa denki me-ka no sanyoumusen no seihin ha hyouban ga) ” ° 接著,說明如此所被構成的客戶端裝置110k之動作 。圖27係客戶端裝置110k之動作的流程圖。 從S101至S105,係進行和圖6所示客戶端裝置110 同樣的處理。亦即,透過麥克風所被輸入之語音,係藉由 特徵量算出部210而將其特徵資料予以抽出(S101)。然 後,在特徵量保存部230中係保存有特徵量資料(S1 02) 。接著,藉由特徵量壓縮部220將特徵量資料進行壓縮( S1 03)。已被壓縮的壓縮特徵量資料,係被送訊部225發 送至伺服器裝置120 (S104)。然後,於伺服器裝置120 上進行語音辨識,從伺服器裝置120發送辨識結果,被收 訊部235所接收(S105)。然後,根據語音辨識結果,藉 由錯誤區間指定部240來指定錯誤區間(S106)。此外, 亦可基於該已被指定之錯誤區間,來指定前後文脈。 接著,詞根區間係被詞根區間指定部242所指定、確 定(S 7 0 1 )。此外,此時,位於詞根區間中的詞根字串, 是有位於客戶端裝置110k中所具備之使用者字典(例如 ,假名漢字變換字典中的使用者所登錄之字彙、或有被登 錄在連絡人清單、電話簿中的名字等)情況下,亦可進行 置換成該字彙的處理。然後,藉由分割部243,以詞根區 間爲交界而分割出錯誤區間(S702 )。進行該分割處理的 同時,藉由字典追加部265,將已被指定的詞根字串,保 持在字典保持部283中(S 70 3)。 -66 - 200951940 其後,藉由錯誤區間特徵量抽出部260,將錯誤區間 的特徵量資料及詞根區間的特徵量資料予以抽出(S 1 07a ),藉由訂正部270而將錯誤區間及詞根區間的特徵量資 料予以再辨識,以進行訂正處理(S10 8a )。然後,錯誤 區間的文字資料、和收訊部2 3 5上所接收到的文字資料會 進行統合,經過正確辨識所得到之文字資料,會被顯示在 顯示部290上(S109)。此外,在統合之際,以交界的字 φ 彙爲基準,而將區間1與區間2的結果,加以連結。又, 訂正部270,係當上述詞根字串是基於使用者字典而被變 換的情況下,亦可將變換過的字串當成拘束條件來進行語 音辨識處理,使其進行訂正處理。 在本實施形態中,雖然是以詞根的字串係位於伺服器 的辨識結果中爲前提來說明,但該詞根的字串係亦可在客 戶端裝置110k中生成。此情況下,在圖27的處理S106 中的錯誤區間指定處理之後,先生成詞根字串,然後進行 ❹ 詞根區間確定處理。又,在客戶端裝置100k中的上述圖 27之處理,係亦可在伺服器或其他裝置上進行。甚至,雖 然說明了訂正方法係藉由辨識而進行之方法,但亦可用其 他做法例如基於字串間類似度的方法。此時就不需要特徵 量保存部230及將音響特徵量資料予以保存之處理(S1 02 )、錯誤區間特徵量抽出部260、訂正部270及以音響特 徵來進行辨識(S108a)。 甚至,當詞根的字串係有在字典保持部283中時,則 亦可利用字典保持部2 8 3中的資訊。例如在字典保持部 -67- 200951940 2 83中有對應於“寸y 3々Λ七> (sanyoumusen ) ”的字 彙、例如‘‘三洋無線”時,則亦可不追加至詞根字典。 又,在之前的例子中雖然在分割區間時,區間1與區 間2係都有包含詞根區間,但這並非必須如此,亦可在各 分割區間中不包含詞根。亦即,亦可將第二個字彙“ T过 (dewa ) ”至第5個詞根字串之開始爲止,分割成區間1 :將第5個詞根字串結束至第8個字彙的“ # ( ga ) ” , 分割成區間2。此情況下,就不需要將詞根的字串對字典 追加之處理。 接著,說明本實施形態的客戶端裝置110k的作用效 果。於該客戶端裝置ll〇k中,收訊部235係將辨識結果 ,從伺服器裝置120進行接收,錯誤區間指定部240係將 錯誤區間予以指定。然後,詞根區間指定部242,係指定 錯誤區間中的詞根區間。此係可藉由從伺服器裝置1 20所 發送之辨識結果中所被附加的屬性資訊來判斷。然後,訂 正部270係將已被詞根區間指定部242所指定之詞根區間 所對應之特徵量資料,從特徵量保存部23 0中抽出,使用 該當已抽出之特徵量資料來進行再辨識,以執行辨識結果 之訂正。藉此’針對詞根這類未知詞,就可進行訂正處理 。亦即,可依照被稱作詞根區間的未知詞之區間來進行再 辨識。 又,於本實施形態的客戶端裝置ll〇k中,分割部243 係依照已被詞根區間指定部2 4 0所指定的詞根區間,來將 辨識結果予以分割成複數區間。然後,訂正部270係對已 200951940 被分割部243所分割之每一分割區間,執行辨識結果的訂 正。藉此,就可縮短辨識對象,可進行較正確的辨識處理 〇 又,於客戶端裝置1 l〇k中,分割部243係將詞根區 間的終點視爲一分割區間的終點,並且將詞根區間的起點 視爲前記一分割區間的下一分割區間的起點,以此方式來 分割辨識結果。然後,訂正部270,係對已被分割部243 φ 所分割之每一分割區間,執行辨識結果的訂正,並且將詞 根區間,視爲各分割區間之訂正時的拘束條件。藉此,詞 根區間就會被包含在分割區間之任一者。因此,在辨識處 理之際必定會包含詞根區間,藉此就可將詞根字串視爲拘 束條件來進行辨識處理。 又,於本實施形態的客戶端裝置ll〇k中,字典追加 部265係將已被詞根區間指定部242所指定的詞根區間中 的詞根字串,追加至辨識處理所需之字典保持部2 83。藉 Q 此,就可累積詞根字串,在今後的辨識處理中有效運用, 可進行較正確的辨識處理。 <第12實施形態> 在第11實施形態中雖然說明了以詞根字串爲交界來 進行分割之方法,但在本實施形態則是說明,即使不分割 仍進行再辨識時,必定使用詞根字串之方法。本實施形態 ,係和上述第11實施形態爲同樣的裝置構成。 圖29係語音辨識時的探索過程之槪念圖,在圖29(a -69- 200951940 )中係圖示了含有詞根字串“寸V 3々Λ七v ( sanyoumusen) ”的探索過程,圖29(b)係以詞根字串爲 拘束條件,圖示複數區間中的探索過程之槪念圖。 一般而言,語音辨識探索過程中,會計算所有的路徑 的假說之似然,將中途的結果予以保存,最終會按照似然 由大而小之順序而生成結果。實際上,考慮到成本面’會 利用在中途將探索的範圍縮減成一定範圍以內的方法。在 本實施形態中,當已被詞根區間指定部242所指定之詞根 區間是位於所定區間(例如2秒至3秒之間)時,則訂正 部270係使用該詞根區間中所被描述的詞根字串,在探索 的過程中,使詞根字串有出現的路徑的順位高於其他路徑 ,最終會優先輸出包含詞根字串的辨識結果的方式,來進 行辨識處理。例如,以下的探索路徑是被訂正部270所獲 得、保持。 路徑 1:最近(saikin) 过(dewa) /玄関(kenkan) / T (de) /待右合^)甘(machiawase) 路徑 2:昨日(kinou) /〇 (no) /会議(kaigi) /(± (wa )/世界(sekai) /中(cyuu) / 路徑 3:最近(saikin) /*ey; (dewa) / 単価(tanka ) /高 Ιλ ( t a k a i ) /寸V彐々厶七y ( sanyoumusen ) 路徑 4 :最近(saikin ) /力过(dewa) /電気〆一力( denkime-ka ) /❸(no ) /寸 y 彐夕么七;y ( sanyoumusen ) 其中的路徑3與路徑4中具有“寸( sanyoumusen ) “ ’因此訂正部 270會進行使這二個路徑 -70- 200951940 的順位高於路徑1、路徑2之處理。若在此處縮減範圍, 則不會留下路徑1及路徑2,而是留下路徑3及路徑4。 然後再判斷“寸乂3々厶七> (sanyoumusen) ”的出現 位置,將路徑篩選成,限定在接近原本辨識結果中所存在 的“寸 >彐々厶七y (sanyoumusen) ” 之出現位置( 300ms至500ms )之一定範圍即可。又,亦可使得最終的 辨識結果中,有出現“寸y彐9厶七> (sanyoumusen) ”的候補是較未出現“寸厶七y ( sanyoumusen ) ”的候補優先輸出。 如以上所述,在客戶端裝置1 l〇k中,訂正部270係 將已被詞根區間指定部242所指定之詞根區間中所描述之 詞I根字串加以含有的假說,當作辨識的探索過程而提高優 先順位而加以保持,從該當假說中選擇出最終的辨識結果 ’以執行訂正。藉此,就可必定使用詞根字串來進行辨識 處理。 【圖式簡單說明】 〔ffl 1〕本實施形態之含有語音辨識結果訂正裝置亦 即客戶端裝置11〇(包含110a〜110k)的通訊系統的系統 構成圖。 〔圖2〕客戶端裝置110之機能的區塊圖。 〔圖3〕客戶端裝置]no的硬體構成圖》 〔圖4〕語音辨識結果中所含之各種資訊之槪念的槪 念圖。 -71 - 200951940 〔圖5〕 (a)當指定了錯誤區間前後文脈時的槪念圖 ,(b)基於拘束條件來進行辨識處理之際之槪念的槪念 圖。 〔圖6〕客戶端裝置110之動作的流程圖。 〔圖7〕包含錯誤區間之指定的訂正處理之詳細處理 的流程圖。 〔圖8〕藉由使用者輸入而受理錯誤區間的客戶端裝 置ll〇a之機能的區塊圖。 〔圖9〕客戶端裝置ii〇a之處理的流程圖。 〔圖10〕客戶端裝置110a上的藉由使用者輸入而指 定錯誤區間時的詳細處理的流程圖。 〔圖11〕該客戶端裝置110b之機能的區塊圖。 〔圖12〕客戶端裝置li〇b之處理的流程圖。 〔圖13〕客戶端裝置ll〇b上的錯誤區間指定時的詳 細處理的流程圖。 〔圖14〕客戶端裝置11 0c之機能的區塊圖。 ❹ 〔圖15〕客戶端裝置ii〇c之處理的流程圖。 〔圖16〕客戶端裝置11 〇d之機能的區塊圖。 〔圖17〕客戶端裝置11 〇d之處理的流程圖。 〔圖18〕客戶端裝置11 〇f之機能的區塊圖。 〔圖19〕客戶端裝置ii〇f之處理的流程圖。 〔圖20〕客戶端裝置110g之機能的區塊圖。 〔圖21〕客戶端裝置110g之處理的流程圖。 〔圖22〕客戶端裝置110h之機能的區塊圖。 -72- 200951940 〔圖23〕客戶端裝置110i之機能的區塊圖。 〔圖24〕將字彙資訊視爲拘束條件而指定之部分進行 訂正處理時之槪念的槪念圖。 〔圖25〕客戶端裝置110之變形例的區塊圖。 〔圖26〕客戶端裝置UOk之機能的區塊圖。 〔圖27〕客戶端裝置11 〇k之動作的流程圖。 〔圖28〕發話內容、辨識結果、分割區間之對應的說 @ 明用說明圖。 〔圖29〕語音辨識中的探索過程之槪念圖。 【主要元件符號說明】"B 0 target 奁 activation 0 death 沁 (C, all $9 刃 协 T T - - - - - - 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 」 "In order to activate this target, it is necessary to cooperate with each other." The user operates the operation unit 236 to set "C 0 target 奁 ( kono mokuhyou wo )" as the starting point of the error section, and the end point determining unit 24 1 is the end point (comma portion) closest to the portion, and is determined as the end point. The error section specifying unit 240c can specify the error section based on the end point. In the above example, the end point of the error section is designated " The part of "," in fz吣IZ辻, (tame niwa,)". In addition, the part of "," is actually not a voice, but a state of pause in a moment. In addition, as a specific pronunciation system except commas In addition to the period, you can also pronounce the words "no~(e-)", "state 0~(ano-)", or the end of "masu" and "TT (desu)" Word vocabulary. Next, the example will be wrong An example of the method in which the third vocabulary of the starting point is considered as the end point. The article shown below is to distinguish the state of the vocabulary unit. In addition, “the factory indicates the division of the vocabulary.” “Two © / target / 奁 / active / / / / 吣 / (C / test, / are / $9 / 0 / synergy / 洳 / necessary / 7 only.) (kono / mokuhy〇u / wo / kasseika / no / tame / ni / wa, / Mina/san/no/ky oury ouku/ ga/hituyou/desu ° ) Chinese translation: "for / activation / this / target /, / need / everyone / / synergy /." 200951940 For example, 'set the starting point' In the case of activation (kasseika), and when M = 3, "tame" in "activation/7/kaseika/n〇/tame" becomes the term vocabulary. Therefore, The error section specifying unit 24〇C can designate “activation/blade/death” (kasseika/no/tame) as an error section. Further, of course, Μ=3. Next, 'describe the candidate for the identification result. An example of a method in which a number of words (number of collisions) is set to be an end point. For example, the following example is used. In *-(^>/target/bin/activation/〇/大;:^) ((] <;011〇/111〇] In the <:1111>^011/evaluation/kasseika/no/tame)", the following candidates can be excluded. "Kasseika": "dare", "takusan", "osusume" "Φ(ηο)" force, (ka) ' " 553⁄4 (aru) "/ϊ吣(tame)":-(no candidate) As a reference, the pronunciation of Chinese and its candidate examples are as follows. Q Hokkaido: Judo sees other events as scheduled: the raised subgrade lifts the volume: the number of candidates is reflected in the ambiguity of the interval. The lower the reliability, the more candidates will be sent from the server device 120. . Further, in this example, it is configured such that the reliability information is not transmitted on the server device 120, and the other candidates obtained based on the reliability information are directly transmitted to the client device 110. In this case, regarding "汔吣 (tame)", since there is no candidate, -51 - 200951940, it can be considered that its reliability is so high. Therefore, in this case, as the error interval, the "® (no)" in front of it can be judged as the end of the error interval. Further, the term "the end point of the error interval" is not limited to the immediately preceding side, and may have a certain extent. As described above, the method of considering the end point in the method based on the reliability, the method of using the specific pronunciation symbol (or pronunciation), and the method of using the starting point to the first vocabulary as the error interval can be considered. The methods are combined 'that is, the correction result of these complex methods is taken as the form of N-be st or the form of the first one is selected from the identification result of the complex method. In this case, the identification result can be displayed in the order of the score of the identification result, and the user can select any identification result from the list. In this way, based on the error section specified by the error section specifying unit 240c, the error section context specifying unit 250 specifies the section including the preceding and succeeding sections, and the error section feature quantity extracting section 260 sets the feature amount data from the feature amount holding section 230. The correction unit 270 extracts the feature amount data for re-identification processing to perform correction processing. Next, the operation of the client device 11〇f thus constructed will be described. Figure 19 is a flow chart showing the processing of the client device 110f. The voice input through the microphone is extracted by the feature amount calculation unit 210 (si〇l). Then, feature quantity data (s1〇2) is stored in the feature quantity storage unit 230. Next, the feature amount data is compressed by the feature amount compressing unit 220 (s1). The compressed feature quantity data that has been compressed is transmitted to the server device 120 by the transmitting unit 225 (S104). 200951940 Next, voice recognition is performed on the server device 120, and the identification result is transmitted from the server device 120, received by the receiving unit 235, temporarily stored, and the recognition result is displayed on the display unit 290 (S105a). Then, the user judges the starting point of the error section based on the recognition result displayed on the display unit 2 90, and the starting point is specified by operating an operation unit 23 6 . Then, when the user inputs the detecting unit 238 to detect the fact that the starting point has been designated, the end point determining unit 24 1 automatically determines the end point of the error section. For example, the judgment is made based on the reliability included in the speech recognition result, or the place where the predetermined pronunciation symbol appears is judged as the end point, and then the next one from the starting point (the predetermined pre-determined 値) is judged as the end point. . Then, the start point and the end point are specified by the error section specifying unit 240c. Then, the context before and after is specified based on the specified error interval (S106C). Based on the error section including the context, the error section feature quantity extracting unit 260 extracts the feature amount data (φ S107 ), and performs the speech recognition again by the correction unit 270 to generate the text data of the error section (S 1 08). Then, the character data of the error section and the character data received by the receiving unit 253 are integrated, and the correct character data is displayed on the display unit 290 (S109). Further, the processing of S105a to S108 including S106c is performed in substantially the same manner as the flowchart shown in Fig. 10. However, in the processing of S3 05, the end point determining unit 24 1 automatically determines the end point of the error section and Save this, it's different. As described above, according to this embodiment, the -53-200951940 designation method of the error section can provide a very convenient device in accordance with the usual revision habits of the user. <Seventh Embodiment> Next, a seventh embodiment will be described. According to the present embodiment, in the error section, the user designates the leading character, whereby the designated character can be used as a constraint condition to perform relatively accurate speech recognition. Fig. 20 is a block diagram showing the configuration of the client device 11 〇g of the seventh embodiment. The client device 110g includes a feature amount calculation unit 210, a feature amount compression unit 220, a feature amount storage unit 230, a transmission unit 225, a reception unit 235, an operation unit 236, a result storage unit 237, and user input detection. The unit 23, the error section specifying unit 240a, the error section context specifying unit 20.5a, the error section feature quantity extracting unit 260, the correcting unit 270, the integration unit 280, the acoustic model holding unit 281, the speech model holding unit 282, and the dictionary holding The unit 283 and the display unit 290 are configured. The client device 1 10g is implemented in the same manner as the client device 110 by the hardware shown in FIG. In the client device UOg, the operation unit 236 receives the corrected character in the error section from the user as a constraint condition, and the error interval before and after the context specifying unit 250a specifies the context before and after the error interval, and the operation unit 23 6 In the corrected text, the correction unit 270 performs the re-identification processing to perform the correction processing by using the context of the error interval and the corrected text as the constraint conditions, and is characterized in that the correction processing is performed. In other words, the operation unit 236 accepts an input for specifying an error section from the user, and thereafter accepts the corrected character input in the error section -54-200951940 error interval before and after the context specifying unit 250a, and performs the above The error interval before and after the context designation unit 250 is substantially the same, and after the error interval, the recognized vocabulary is specified (one discrimination), and the correction received after the operation unit 236 is specified. . The correction unit 270 performs re-recognition processing based on the feature amount data extracted in the error section feature amount extraction and the constraint conditions specified in the context section before and after the error section, and performs the correction processing. For example, the above processing will be explained based on the following examples. <sounding content> "B-targets are achieved by 吣 二 二 、 皆 皆 皆 皆 皆 皆 皆 皆 皆 皆 皆 皆 皆 皆 皆 皆 皆 皆 皆 皆 皆 皆 皆 皆 皆 皆 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 The goal requires the cooperation of all of you." <Voice recognition result> "B1 target 奁activated blade 吣丨 吣丨;: 辻, all $9 0 synergy 妒t." (kono mokuhyou wo kasseika no tame ni wa, mina kyouryouku ga hituyou desu ° ) "In order to activate this goal, you need to work together." In this case, the user operates the operation unit 236 to recognize the unit character in the first instance, and the 260 designation unit can perform the necessary san no. Necessary 7 san no error area -55- 200951940 starting point (in the above example, it is "bottom position" of "kono mokuhyou wo"), input the correct text content. The hiragana name that should be entered is “tassei suru tame ni.” The following example uses the input of “death (ta)” as part of the input. As an example to illustrate. In addition, it is assumed that the start and end points of the error interval have been determined or will be decided by the same method as described above. When the user inputs the "dead (ta)" through the operation unit 23, the error interval before and after the context designation unit 25a will be the target of the context! (kono mokuhyou wo) ” , and “t ( ta ) ” as the input text, is regarded as a constraint condition; that is, “kono mokuhyou wo ta” is regarded as identifying the feature quantity data. In this case, the identification result of the voice recognition using the text input content of the Bodhisattva is regarded as a constraint condition, and the user is prompted to present a more accurate recognition result. Further, the correction method is In addition to speech recognition, the key input method can be used. For example, as a key text input method, Japanese kana kanji conversion can be considered. In the kanji kana character conversion, the input text content is compared with a dictionary, and Predict the function of the result of the transformation. For example, once "dead (ta)" is entered, the vocabulary headed by "death (ta)" is sequenced from the database and presented to the user. This function displays a list of candidate candidates for pseudonym Chinese character conversion and speech recognition, based on these lists, the user can The order of the list can be selected according to the order of the scores assigned to the transformation results or the identification results, and the candidates based on the kana-kanji transformation and the speech recognition can be compared. For the candidates that are completely consistent or partially consistent, the scores assigned to each other may be counted and sorted based on the scores. For example, the score of the candidate A1 "tassei" of the fake name transformation is 50, and the result of the voice recognition is obtained. If the score of candidate B1 "tassei suru" is yes, since candidate A1 and candidate B1 are partially consistent, φ can calculate the score for each score, multiply the score based on the calculation. In addition, in the case of complete agreement, adjustment processing such as multiplication number is not required. Alternatively, under the stage that the user has selected the candidate A1 "tassei" of the kana-han transformation, " <1 0 ko 奁 ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( Next, the operation of the client device 110g thus constructed will be described. Figure 21 is a flow chart showing the processing of the client device 110g. The voice input through the microphone is extracted by the feature amount calculation 210 (S101). Then, feature quantity data (S1〇2) is stored in the feature storage unit 230. Next, the feature amount compressing unit 220 compresses the feature amount data (S103). The compressed compressed feature data is transmitted to the server device 120 by the transmitting unit 22 5 (S104). Then, the voice recognition is performed on the server device 120, and the identification result is transmitted from the servo device 120, and the received portion 235 is received. The received portion is temporarily stored in the display unit 290 (S105a), and the identification result is displayed on the display unit 290 (S105a). Then, the user specifies the error section based on the recognition result displayed on the display unit 290 (S106d). Then, the user performs the character input required to correct the recognition result of the error section with respect to the operation unit 236. When the character input is received, the operation unit 236 outputs the error interval before and after the context specifying unit 250a, and the error interval before and after the context specifying unit 250a is based on the input character and the specified error interval. Specify the context before and after. Based on the error section including the context, the q error section feature quantity extracting unit 2 60 extracts the feature amount data (S107), and performs the re-speech recognition by the correction unit 270 to generate the text data of the error section ( S108). Then, the character data of the error section and the character data received by the receiving unit 253 are integrated, and the correct character data is displayed on the display unit 290 (S109). Further, the processing of S105a to 108 including S106d is performed in substantially the same manner as the flowchart shown in Fig. 1A. In addition, in the present embodiment, in addition to the processing in the flowchart of Fig. 1, it is necessary to add, and in S309, the character accepted on the operation unit 236 is set as a constraint condition. . In addition, before entering S309, it is necessary to complete the text input acceptance of the restraint condition. As described above, according to this embodiment, the constraint condition is that the character specified by the user is set in addition to the context, and thus the voice recognition can be performed accurately. <Eighth Embodiment> -58- 200951940 Next, an eighth embodiment will be described. According to the present embodiment, the result of re-recognition on the correction unit 2 70 may be an identification result different from the recognition result before re-recognition. Fig. 22 is a block diagram showing the configuration of the client device π 〇h of the eighth embodiment. The client device 11 Oh includes a feature amount calculation unit 210, a feature amount compression unit 220, a feature amount storage unit 230, a transmission unit 225, a reception unit 235, an operation unit 236, a result storage unit 237, and a user input detection. The φ section 238, the error section specifying unit 240a, the error section context specifying section 250, the error section feature amount extracting section 260, the correcting section 270, the integration section 280, the acoustic model holding section 281, the speech model holding section 282, and the dictionary holding section 283. The display unit 290 is configured. The client device 11 〇h is implemented in the same manner as the client device 110 by the hardware shown in FIG. The following description will be centered on the difference from the client device 11 in Fig. 2 . Similarly to the correction unit 270 in Fig. 3, the correction unit 270b is a part for performing a re-identification process or the like. Then, the correcting unit 270b performs the recognizing process based on the recognition result stored in the result e holding unit 237 so that the same recognition error does not occur. That is, the correction unit 270b is compared with the recognition result in the error section specified in the error section specifying unit 240a, and is included in the re-identification search process in order to prevent the same recognition result from being obtained. The path 'the identification result in the error interval' is excluded from the candidate, and is processed as such. As the exclusion processing 270b, the feature quantity data of the error section is multiplied, and the predetermined coefficient ' is multiplied so that the probability of the hypothesis in the candidate is minimized, and as a result, the minimum candidate is not selected. . Further, in the above method, -59-200951940 is a candidate (for example, "activation") in which there is a possibility of occurrence of an error in re-identification, and is excluded from the identification result candidate, but is not limited thereto, and may be When the identification result is re-identified, a candidate (for example, "activation") of the identification result with the possibility of error is not displayed. Further, the client device 1 1 Oh performs substantially the same processing as the flowchart shown in FIG. Further, the identification processing of the error section in S108 differs from the identification processing excluding the candidate in order not to display the same identification result. As described above, since the vocabulary of the correction target has an error, the vocabulary which is already the correction target should not be outputted for the re-identified result. Therefore, in the present embodiment, such a correction result can be prevented from being displayed. <Ninth Embodiment> Next, a ninth embodiment will be described. According to this embodiment, the average value is calculated in the error section of the feature amount data extracted by the error section feature quantity extracting unit 260, and the averaged data is subtracted from the feature amount data to perform the recognizing process. This specific configuration will be described. Fig. 23 is a block diagram showing the function of the client device 11〇i of the ninth embodiment. The client device 11〇i includes a feature amount calculation unit 210, a feature amount compression unit 220, a feature amount storage unit 230, a transmission unit 225, a reception unit 235, an error section designation unit 240, and an error interval context designation. The unit 250, the error section feature quantity extracting unit 260, the average unit calculating unit 261 (calculation means), the feature normalizing unit 2 62 (correcting means), the correcting unit 270 (correcting means), the integration part 280 200951940, and the acoustic model holding unit 281 The speech model holding unit 2 82, the word unit 283, and the display unit 290 are configured. The client device il〇i, the client device 11 is similarly implemented by the hardware shown in FIG. The difference from the client device 110 in Fig. 2, that is, the average portion 261 and the feature normalization portion 262 will be mainly described. The average 値 calculation unit 261 is for calculating the portion of the error region φ (or the average 前后 before and after the error interval) in the feature amount data extracted by the error estimator 260. More specifically, the average 値 calculation unit 261 accumulates the output 値 (size) of each identification single-frequency in the error section. Then, add the output 値 obtained by dividing it by the number of units it recognizes to calculate . For example, the identification unit in the "kasseika/no/tame" section is the part separated by the slash " /" - the identification unit, ie the identification frame η, is the frequency fn 1~ Fn 1 2 , if the output 値 is assumed to be gn1 to gnl2, the average φ gl=Egnl/n of the frequency fl (n=l to 3 in the above example) is expressed. That is, it is assumed that the frequency f1 (the output 値 is gl 1 to gl 12 ) constituting "kasseika" constitutes "¢9 (no)" f21 to f212 (output 値 is g21 to g212), which constitutes " The average value of the case rate fl of the frequency f3 1 to f312 of 尨吣 (the output 値 is g3 1 to g3 12 ) can be calculated by (gll + g21 + g31) / 3 to calculate the feature normalization unit 262. In the subtraction process, the average 値 of each frequency calculated by the calculation unit 261 is subtracted from the feature amount data of each frequency. Then, the correction unit 270 is kept below the system and the calculation is performed. The difference between the specific and the average is the cumulative error of the average 値. For each composition 値 [fl 1~ frequency tame), the frequency average 値 rate can be calculated by subtracting -61 - 200951940 'Re-identification processing is performed to perform correction processing. In the present embodiment, the characteristics of the sound pickup device such as a microphone required to input the voice to the feature amount calculation unit 210 can be obtained by correcting the feature amount data using the average 値 calculated by the average 値 calculation unit 261. The information after removal. That is, the noise at the time of receiving the microphone can be removed, and the correct voice can be corrected (identification processing). Further, in the above-described embodiment, the error section extracted by the error section feature amount extracting unit 260 is applied, but the feature amount data including the section of the error section having a certain length may be used. Further, the average 値 calculation unit 261 and the feature normalization unit 262 can be applied to the second embodiment to the eighth embodiment, respectively. <First Embodiment> In the client devices 110 to 110i, which are the voice recognition result correction devices described in the first to ninth embodiments, the correction processing is performed by the correction unit 270 (again) Identification processing), but is not limited to this. In other words, the error section specified by the error section specifying unit 240 may be notified to the server device 120, and the server device 120 may perform the re-correction process, and the receiving unit 25 may receive the error correction process. The composition of the revised results. The re-routing processing on the server device 120 is designed as the correction processing in the correction portion 270 of the above-described client device 11A. As a specific example of the notification processing in the client apparatus 110, it is conceivable that the time information of the error section specified by the error section specifying unit 240 or the time information including the vocabulary before and after the error section is specified by the error section. The unit 240 calculates the 'time information to the server device 120 by the transmitting unit 225 as calculated by 200951940. On the server device 12, a speech recognition device different from the initial identification process is performed to prevent the erroneous recognition from being performed again. For example, 'change the acoustic model, speech model, and dictionary to perform identification processing. <First Embodiment> Next, the client device 11〇k of the eleventh embodiment will be described. The client device 1 1 〇k in the embodiment of the present invention recognizes the root segment and uses the root string described in the root segment to perform the correction process. Figure 26 is a block diagram of the function of the client device 11 Ok. The client device 110k includes a feature amount calculation unit 210, a feature amount compressing unit 220, a transmitting unit 225, a feature amount storage unit 230, a receiving unit 235, an error section specifying unit 240, a root section specifying unit 242, and a segmentation unit. The unit 243, the error section feature quantity extracting unit 260, the dictionary adding unit 265, the correcting unit 270, the integration unit 280, the acoustic model holding unit 281, the speech model holding unit 282, the dictionary holding unit 283, and the display unit 290. The first embodiment differs in that the root interval specifying unit 242, the dividing unit 243, and the dictionary adding unit 265 are included. The following is a description of the composition of the difference. The root section specifying unit 242 specifies a section for the section including the stem string from the error section specified by the error section specifying section 240. For the root string, as the attribute information, a "subword" indicating that it is an unknown word is attached, and the root interval specifying unit 242 can specify the root interval based on the attribute information. -63- 200951940 For example, in FIG. 28(a), the identification result recognized by the utterance content on the server device 120 is illustrated. According to FIG. 28(a), a pair of "subwords" is added as "attribute information", and the root interval specifying unit 242 can "seven" (sanyoumusen) based on the attribute information. The identification becomes a root string, and the string portion is designated as a root interval. Further, in FIG. 28(a), a frame index is added to the identification unit of the recognized recognition result in accordance with the content of the speech. In addition, in FIG. 28(a), the error section specifying unit 240 can designate an error section in accordance with the same processing as described above, and can set "T呔(de wa)" ( The second identification unit) to "妒 ( ga ) " (the eighth identification unit) is designated as the error interval. In Fig. 28 (b), the Chinese pronunciation example is shown as a reference. The division unit 243 The root string included in the root section specified by the root section specifying unit 242 is regarded as a boundary, and the error section specified by the error section specifying unit 240 is divided. The portion shown in Fig. 28 (Q a ) is shown in Fig. 28 (Q a ). Based on the example, based on The root string is also "inch V 3 々厶 seven y (sanyoumusen)", and is divided into interval 1 and interval 2. That is, the second identification unit of "7 (i (dewa)" to the fifth identification unit "inch" (sanyoumusen), that is, l〇〇msec to 500msec in terms of frame index, is divided into interval 1, the fifth identification unit of "inch (sanyoumusen) to 8th The "identification (ga)" of the identification unit, that is, 300 sec to 660 msec, is divided into sections 2. -64 - 200951940 The dictionary addition section 265 is the radical string specified by the stem section specifying section 242' The part added to the dictionary holding unit 283. In the example of Fig. 28 (a), "Sanyoumusen" is added to the dictionary holding unit 2 as a new vocabulary and the dictionary is added to the dictionary. The holding unit 283 adds the pronunciation of the root, and adds the probability of the connection of the root to the other vocabulary in the speech model holding unit 2 82. The connection probability in the speech model holding unit 2 82 is based on the H-level dedicated to the root prepared in advance. (class). Again, the string of the root model 'cause Since it is almost a proper noun, it is possible to use the level of the noun (proper noun), and the error section feature quantity extracting unit 260 is based on the section 1 obtained by the division unit 243 and In the section 2', the feature amount data held by the feature amount storage unit 23 0 is extracted. Then, the correction unit 270 performs re-identification processing on the feature amount data corresponding to each section to execute the correction processing. Specifically, in the case of the Q example of Fig. 28(a), the correction result of the section 1 is "τ 辻 辻 一 一 一 一 一 一 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : The correction result of the interval 2 is "inch>3々厶7 V Φ product (iyoumusen no seihin wa hyouban ga)" 0 integration unit 280, based on the identification result obtained by the correction by the corrected portion 270 (Interval 1 and Interval 2) are integrated into the root string of the boundary, and are integrated with the recognition result received on the receiving unit 235, and displayed on the display unit 290. As shown in Fig. 28 (a As an example, the result of the integration, the text of the final error interval will become "T丨±电気-65- 200951940 / one force blade inch> 彐夕厶七>0 products (i evaluation (dewa denki me -ka no sanyoumusen no seihin ha hyouban ga) ” Next, the operation of the client device 110k configured as described above will be described. Fig. 27 is a flowchart showing the operation of the client device 110k. From S101 to S105, the process proceeds to Fig. 6 The client device 110 is shown in the same process. That is, through the microphone The input voice is extracted by the feature amount calculation unit 210 (S101). Then, the feature amount storage unit 230 stores the feature amount data (S102). Then, the feature amount is compressed. The part 220 compresses the feature quantity data (S1 03). The compressed feature quantity data that has been compressed is sent to the server device 120 by the transmitting unit 225 (S104). Then, the voice recognition is performed on the server device 120. The identification result is transmitted from the server device 120, and is received by the receiving unit 235 (S105). Then, based on the speech recognition result, the error section specifying unit 240 specifies the error section (S106). The specified error interval is used to specify the context. Next, the root interval is specified and determined by the root interval specifying unit 242 (S 7 0 1 ). Further, at this time, the root string located in the root interval is located at the client. a user dictionary provided in the end device 110k (for example, a vocabulary registered by a user in a kana-kanji conversion dictionary, or a name registered in a contact list, a phone book, etc.) In this case, the division unit 243 may divide the error section by using the root section as a boundary (S702). The division processing is performed by the dictionary addition unit 265. The root string that has been designated is held in the dictionary holding portion 283 (S 70 3). -66 - 200951940 Then, the error section feature quantity extracting unit 260 extracts the feature amount data of the error section and the feature amount data of the root section (S 1 07a ), and the error section and the root are corrected by the correction section 270. The feature quantity data of the section is re-identified to perform a correction process (S10 8a ). Then, the text data of the error section and the text data received by the receiving section 253 are integrated, and the character data obtained by the correct recognition is displayed on the display unit 290 (S109). In addition, at the time of integration, the results of the interval 1 and the interval 2 are linked based on the boundary of the word φ. Further, when the root string is changed based on the user dictionary, the correction unit 270 may perform the voice recognition processing by performing the voice recognition processing on the converted character string as a constraint condition. In the present embodiment, the string of the root is described on the premise that the string of the root is located in the identification result of the server, but the string of the root may be generated in the client device 110k. In this case, after the error section designation processing in the processing S106 of Fig. 27, the root string is formed, and then the 词 root interval determination processing is performed. Further, the processing of Fig. 27 described above in the client device 100k can be performed on a server or other device. Even though the method of correcting is performed by identification, other methods such as a method based on the similarity between strings can be used. At this time, the feature amount storage unit 230 and the process of storing the acoustic feature amount data (S1 02), the error section feature amount extracting unit 260, the correcting unit 270, and the acoustic feature are not required to be recognized (S108a). Even when the word string of the root is in the dictionary holding portion 283, the information in the dictionary holding portion 283 can be utilized. For example, if there is a vocabulary corresponding to "inch y 3 々Λ & & & san 、 san san san san san san san san san san san san san san san san san san san san san san san san san san san san san san san san san san san san san san san san san san san In the previous example, although the interval 1 and interval 2 systems all have a root interval in the interval, this is not necessary, and the root may not be included in each segment. That is, the second vocabulary may also be used. T (dewa) ” to the beginning of the fifth root string, divided into interval 1: The “# ( ga ) ” of the 5th lexical string is terminated to the 8th vocabulary, and is divided into interval 2. Next, there is no need to process the addition of the root string to the dictionary. Next, the effect of the client device 110k of the present embodiment will be described. In the client device 11〇k, the receiving unit 235 will recognize the result. Receiving from the server device 120, the error section specifying unit 240 specifies the error section. Then, the root section specifying unit 242 specifies the root section in the error section. This can be transmitted from the server apparatus 120. Identification knot The correction unit 270 extracts the feature amount data corresponding to the root interval specified by the root segment specifying unit 242 from the feature amount storage unit 230, and extracts it using the attribute information. The feature quantity data is used for re-identification to perform correction of the identification result. Thus, the correction process can be performed on an unknown word such as a root. That is, it can be performed according to an interval of an unknown word called a root interval. Further, in the client apparatus 11〇k of the present embodiment, the division unit 243 divides the identification result into a complex section in accordance with the radical section specified by the root section specifying unit 240, and then corrects the result. The unit 270 performs correction of the identification result for each of the divided sections divided by the 200951940 divided portion 243. Thereby, the identification target can be shortened, and a more accurate identification processing can be performed, and the client device 1 l〇 In k, the dividing unit 243 regards the end point of the root section as the end point of the divided section, and regards the starting point of the root section as the next division of the preceding divided section. The starting point of the interval is used to divide the recognition result in this way. Then, the correction unit 270 performs correction of the recognition result for each divided section divided by the divided portion 243 φ, and regards the root interval as each divided interval. The constraint condition of the correction time. Therefore, the root interval is included in any of the division intervals. Therefore, the root interval is necessarily included in the identification process, whereby the root string can be regarded as a constraint condition. Further, in the client device 11A of the present embodiment, the dictionary adding unit 265 adds the root string in the root section specified by the root section specifying unit 242 to the identification processing. The dictionary holding unit 2 83. By borrowing Q, the root string can be accumulated and used effectively in the future identification process, and a more accurate identification process can be performed. <Twelfth Embodiment> In the eleventh embodiment, a method of dividing a root string as a boundary is described. However, in the present embodiment, it is explained that a stem must be used even when re-recognition is not performed. The method of the string. This embodiment is the same device configuration as the eleventh embodiment. Figure 29 is a commemorative diagram of the exploration process during speech recognition. In Figure 29 (a -69-200951940), the exploration process with the root string "inch V 3々Λ七v (sanyoumusen)" is shown. 29(b) is a constrained condition in which the root string is used as a constraint condition to illustrate the exploration process in the complex interval. In general, in the process of speech recognition exploration, the hypothesis of all paths is calculated, and the results in the middle are saved. Finally, the results are generated in order of likelihood. In fact, considering the cost side, it takes advantage of the method of reducing the scope of exploration in the middle to a certain range. In the present embodiment, when the root section specified by the stem section specifying section 242 is located within a predetermined section (for example, between 2 seconds and 3 seconds), the correcting section 270 uses the stem described in the radical section. In the process of searching, the path of the root string has a higher order than the other paths, and finally the method of outputting the identification result of the root string is preferentially outputted for identification processing. For example, the following search path is obtained and held by the correction unit 270. Path 1: The most recent (saikin) (dewa) / porch (kenkan) / T (de) / to be right ^) Gan (machiawase) Path 2: Yesterday (kinou) / 〇 (no) / meeting (kaigi) / ( ± (wa ) / world (sekai) / medium (cyuu) / path 3: recent (saikin) / *ey; (dewa) / 単価 (tanka) / Ι λ ( takai ) / inch V 彐々厶 seven y ( sanyoumusen Path 4: recent (saikin) / force (dewa) / electric power (denkime-ka) / ❸ (no) / inch y 彐 么 七 seven; y (sanyoumusen) where path 3 and path 4 have "inch (sanyoumusen)" 'Therefore, the correction unit 270 performs the processing of making the two paths -70-200951940 higher than the path 1 and the path 2. If the range is reduced here, the path 1 and the path are not left. 2, but leave path 3 and path 4. Then judge the position of "inch 乂 3 々厶 & ( san san san san san san san san san san san san san san san san san san san san san san san san san san san san san san san san san san san san san san san san san彐々厶7y (sanyoumusen) ” The position of the position (300ms to 500ms) can be a certain range. Also, the final identification result can be The candidate for "inch y彐9厶7" (sanyoumusen) is the candidate priority output that does not appear "sanyoumusen". As described above, in the client device 1 l〇k, the correction is made. The section 270 selects the hypothesis that the word I root string described in the root section specified by the root section specifying section 242 is included as an identification search process, and raises the priority order, and selects from the hypothesis. The final identification result 'to perform the correction. Thus, the root string can be used for the identification process. [Simplified illustration] [ffl 1] The voice recognition result correction device, that is, the client device 11 of the embodiment系统 (including 110a to 110k) system configuration diagram of the communication system. [Fig. 2] Block diagram of the function of the client device 110. [Fig. 3] Client device] No hardware composition diagram [Fig. 4] Voice A commemorative commemoration of the various information contained in the identification results. -71 - 200951940 [Fig. 5] (a) A commemorative map when the context of the error interval is specified, and (b) A recognition process based on the constraint condition之Fig. 6 is a flow chart showing the operation of the client device 110. Fig. 7 is a flow chart showing the detailed processing of the correction processing for specifying the error interval. [Fig. 8] A block diagram of the function of the client device 11A for accepting an error section by user input. [Fig. 9] A flow chart of the processing of the client device 〇a. Fig. 10 is a flow chart showing the detailed processing when the error section is specified by the user input on the client device 110a. [Fig. 11] A block diagram of the function of the client device 110b. [Fig. 12] A flow chart of the processing of the client device li〇b. Fig. 13 is a flow chart showing the detailed processing at the time of specifying the error interval on the client device 110b. [Fig. 14] A block diagram of the function of the client device 110c. 〔 [Fig. 15] Flowchart of the processing of the client device 〇 〇 c. [Fig. 16] A block diagram of the function of the client device 11 〇d. [Fig. 17] A flow chart of the processing of the client device 11 〇d. [Fig. 18] A block diagram of the function of the client device 11 〇f. Fig. 19 is a flow chart showing the processing of the client device 〇 〇 f. [Fig. 20] A block diagram of the function of the client device 110g. FIG. 21 is a flow chart showing the processing of the client device 110g. [Fig. 22] A block diagram of the function of the client device 110h. -72- 200951940 [Fig. 23] Block diagram of the function of the client device 110i. [Fig. 24] A commemorative view of the mourning when the vocabulary information is regarded as a constraint condition and the designated part is corrected. FIG. 25 is a block diagram of a modification of the client device 110. [Fig. 26] A block diagram of the function of the client device UOk. Fig. 27 is a flow chart showing the operation of the client device 11 〇k. [Fig. 28] Explanation of the contents of the speech, the recognition result, and the division interval. [Fig. 29] A mourning diagram of the exploration process in speech recognition. [Main component symbol description]

1 1 : CPU1 1 : CPU

12 : RAM12 : RAM

13 : ROM 14 :輸入裝置 〇 15 :輸出裝置 1 6 :通訊模組 1 7 :輔助記憶裝置 101 (110a〜110k):客戶端裝置 120 :伺服器裝置 210 :特徵量算出部 220 :特徵量壓縮部 225 :送訊部 226 :第一辨識部 -73- 200951940 227 :言語模型保持部 228 :字典保持部 229 :音響模型保持部 230 :特徵量保存部 235 :收訊部 2 3 6 :操作部 237 :結果保存部 23 8 :使用者輸入偵測部 239 :時間資訊算出部 240 ( 240a〜240c):錯誤區間指定部 2 4 1 :終點判斷部 242 :詞根區間指定部 243 :分割部 2 5 0, 2 5 0a :錯誤區間前後文脈指定部 251 :字彙資訊解析部 260 :錯誤區間特徵量抽出部 261 :平均値計算部 262 :特徵正規化部 2 6 5 :字典追加部 270, 270a, 270b :訂正部 2 8 0 :統合部 281 :音響模型保持部 282 :言語模型保持部 2 83 :字典保持部 -74 - 200951940 284:言語DB保持部 2 85 :拘束條件記憶部 290 :顯示部 T1 :開始時間 T 2 :結束時間 W1 , W2 :字彙 NW :網路 ❹13 : ROM 14 : Input device 〇 15 : Output device 1 6 : Communication module 1 7 : Auxiliary memory device 101 (110a to 110k): Client device 120 : Server device 210 : Feature amount calculation unit 220 : Feature amount compression Part 225: Transmitting unit 226: First identification unit-73-200951940 227: Speech model holding unit 228: Dictionary holding unit 229: Acoustic model holding unit 230: Feature amount storage unit 235: Reception unit 2 3 6 : Operation unit 237 : Result storage unit 23 8 : User input detection unit 239 : Time information calculation unit 240 ( 240 a to 240 c ) : Error interval specifying unit 2 4 1 : End point determination unit 242 : Root interval specifying unit 243 : Division 2 5 0, 2 5 0a : error interval context specifying unit 251 : vocabulary information analyzing unit 260 : error section feature amount extracting unit 261 : average 値 calculating unit 262 : feature normalizing unit 2 6 5 : dictionary adding unit 270, 270a, 270b : The correction unit 280 : the integration unit 281 : the acoustic model holding unit 282 : the speech model holding unit 2 83 : the dictionary holding unit - 74 - 200951940 284 : the speech DB holding unit 2 85 : the constraint condition storage unit 290 : the display unit T1 : Start time T 2 : End time W1 , W2 : Word NW : Network ❹

-75--75-

Claims (1)

200951940 七、申請專利範圍: 1. 一種語音辨識結果訂正裝置,其特徵爲,具備: 輸入手段,係用以輸入語音;和 算出手段,係用以基於被前記輸入手段所輸入之語音 ,而算出特徵量資料;和 記憶手段,係用以記憶被前記算出手段所算出之特徵 量資料;和 取得手段’係用以取得對前記輸入手段所輸入之語音 的辨識結果;和 指定手段’係用以於前記取得手段所辨識之辨識結果 中,指定出有發生辨識錯誤的錯誤區間;和 訂正手段’係用以從前記記憶手段中所記憶之特徵量 資料’抽出已被前記指定手段所指定之錯誤區間所對應之 特徵量資料,並使用該當已抽出之特徵量資料來進行再辨 識’藉此以執行前記取得手段所得到之辨識結果的訂正。 2. 如申請專利範圍第1項所記載之語音辨識結果訂正 裝置,其中, 前記取得手段,係由以下所構成: 送訊手段’係用以將前記輸入手段所輸入之語音,發 送至語音辨識裝置;和 收訊手段,係用以接收前記語音辨識裝置上所辨識出 來的辨識結果; 前記指定手段,係於前記收訊手段所接收到的辨識結 果中’指定出有發生辨識錯誤的錯誤區間。 -76- 200951940 3 ·如申請專利範圍第1項或第2項所記載之語音辨識 結果訂正裝置,其中,前記指定手段,係藉由受理使用者 操作,以指定錯誤區間。 4.如申請專利範圍第1項至第3項之任1項所記載之 語音辨識結果訂正裝置,其中,前記指定手段,係基於前 記辨識結果中所被賦予的辨識結果之信賴度來判斷錯誤區 間,並指定該當判斷出來之錯誤區間。 @ 5.如申請專利範圍第1項至第3項之任1項所記載之 語音辨識結果訂正裝置,其中,前記指定手段,係計算前 記辨識結果之信賴度,基於該當信賴度來判斷錯誤區間, 並指定該當判斷出來之錯誤區間。 6 ·如申請專利範圍第1項至第5項之任1項所記載之 語音辨識結果訂正裝置,其中, 更具備:特定手段,係用以特定,被前記指定手段所 指定之錯誤區間的前方的至少一個字彙、或是後方的至少 G 一個字彙、或是前記前方字彙及後方字彙之雙方之任一者 加以形成的辨識結果; 前記訂正手段,係將已被前記特定手段所特定之辨識 結果,視爲拘束條件,依照該拘束條件,將錯誤區間之前 方字彙、後方字彙加以包含之區間所對應的特徵量資料, 從前記記憶手段中予以抽出,對已抽出之特徵量資料,進 行辨識處理》 7.如申請專利範圍第1項至第5項之任1項所記載之 語音辨識結果訂正裝置,其中, -77- 200951940 更具備.特定手段’係用以特定’被前記指定手段所 指定之錯誤區間的前方的至少一個字彙、或是後方的至少 一個字彙、或是前記前方字彙及後方字彙之雙方之任一者 加以形成的辨識結果; 前記訂正手段,係將已被前記特定手段所特定之辨識 結果,視爲拘束條件,依照該拘束條件,將錯誤區間所對 應的特徵量資料,從前記記憶手段中予以抽出,對已抽出 之特徵量資料,進行辨識處理。 8. 如申請專利範圍第1項至第7項之任1項所記載之 語音辨識結果訂正裝置,其中, 更具備:字彙資訊特定手段’係用以特定:將被前記 指定手段所指定之錯誤區間的前方的至少一個字彙予以特 定所需之資訊亦即字彙資訊、或是後方的至少一個字彙的 字彙資訊、或是前記前方字彙的字彙資訊及後方字彙的字 彙資訊之雙方之任一者加以形成的辨識結果中之字彙的字 彙資訊; 前記訂正手段,係將已被前記字彙資訊特定手段所特 定之字彙資訊,視爲拘束條件,依照該拘束條件,將錯誤 區間之前方字彙、後方字彙加以包含之區間所對應的特徵 量資料,從前記記憶手段中予以抽出,對已抽出之特徵量 資料,進行辨識處理。 9. 如申請專利範圍第8項所記載之語音辨識結果訂正 裝置,其中,前記字彙資訊,係含有:表示字彙之詞性的 詞性資訊、及表示字彙之念法的讀音資訊,之任1者或複 -78- 200951940 數者。 1 〇 ·如申請專利範圍第8項或第9項所記載之語音辨 識結果訂正裝置,其中, 更具備:未知詞判定手段,係基於前記字彙資訊來判 定,被前記指定手段所指定之錯誤區間的前方的至少一個 字彙、或是後方的至少一個字彙、或是前記前方字彙及後 方字彙之雙方之任一者加以形成的辨識結果的字彙,是否 @ 爲未知詞; 若藉由前記未知詞判定手段而判定了前記辨識結果的 字彙是未知詞,則前記訂正手段係以前記字彙資訊爲基礎 ,來進行辨識結果的訂正處理。 11. 如申請專利範圍第1項至第10項之任1項所記載 之語音辨識結果訂正裝置,其中, 更具備:連接機率記憶手段,係用以記憶字彙彼此的 連接機率; 〇 前記訂正手段,係根據訂正處理已進行過之事實,而 作成該當錯誤區間之字彙及與其前後或其中一方之字彙的 連接機率,使用該當連接機率來更新前記連接機率記憶手 段中所記憶的連接機率。 12. 如申請專利範圍第6項至第1 1項之任1項所記載 之語音辨識結果訂正裝置,其中, 更具備:拘束條件記憶手段,係用以將前記字彙資訊 特定手段所特定出來的字彙資訊或前記特定手段所特定出 來的字彙,當作拘束條件而加以記憶; -79- 200951940 前記訂正手段,係依照前記拘束條件記憶手段中所記 憶之拘束條件,來進行訂正處理。 13. 如申請專利範圍第1項至第12項之任1項所記載 之語音辨識結果訂正裝置,其中, 更具備:受理手段,係用以從使用者受理文字資訊; 前記訂正手段,係將前記受理手段所受理到的文字資 訊,視爲拘束條件,來進行錯誤區間中的辨識結果的訂正 處理。 Λ 〇 14. 如申請專利範圍第1項至第13項之任1項所記載 之語音辨識結果訂正裝置,其中, 更具備:時間資訊算出手段,係用以基於收訊手段所 接收到之辨識結果與前記記憶手段中所記憶之特徵量資料 ,來算出辨識結果的經過時間; 前記指定手段,係基於前記時間資訊算出手段所算出 之時間資訊,來指定錯誤區間。 15. 如申請專利範圍第1項至第14項之任1項所記載 0 之語音辨識結果訂正裝置,其中, 更具備:顯示手段,係用以顯示已被前記訂正手段所 訂正過的辨識結果; 前記顯示手段,係不顯示前記取#手段所取得之辨識 結果。 16. 如申請專利範圍第15項所記載之語音辨識結果訂 正裝置,其中’當前記訂正手段經由再辨識而得到之辨識 結果、和前記取得手段所取得到之辨識結果是相同時’或 -80- 200951940 這些辨識結果分別所含有之時間資訊是有差異時’則判斷 爲辨識錯誤,前記顯示手段就不顯示辨識結果。 17.如申請專利範圍第3項所記載之語音辨識結果訂 正裝置,其中,前記指定手段,係藉由使用者操作而指定 錯誤區間之起點,基於前記取得手段所取得到之辨識結果 中所被賦予的辨識結果之信賴度,來指定錯誤區間之終點 〇 φ 18.如申請專利範圍第3項所記載之語音辨識結果訂 正裝置,其中,前記指定手段,係藉由使用者操作而指定 錯誤區間之起點,從該當起點起遠離所定辨識單位數而指 定錯誤區間之終點。 1 9 .如申請專利範圍第3項所記載之語音辨識結果訂 正裝置,其中,前記指定手段,係藉由使用者操作而指定 錯誤區間之起點,基於前記取得手段所取得到之辨識結果 中的所定之發音記號,來指定錯誤區間之終點。 φ 20.如申請專利範圍第3項所記載之語音辨識結果訂 正裝置,其中, 前記取得手段,係在取得辨識結果之際,取得複數辨 識候補來作爲辨識結果; 前記指定手段,係藉由使用者操作而指定錯誤區間之 起點,基於前記取得手段所取得到之辨識候補之數目,來 指定終點。 2 1 .如申請專利範圍第1項至第20項之任1項所記載 之語音辨識結果訂正裝置,其中, -81 - 200951940 更具備:算出手段,係用以算出,已被前記算出手段 所算出之特徵量資料的錯誤區間加以包含之區間的平均値 $ 前記訂正手段,係將已抽出之特徵量資料,減去前記 算出手段所算出之平均値,將該減算所得之資料,視爲特 徵量資料而進行再辨識處理。 2 2.—種語音辨識結果訂正裝置,其特徵爲,具備·· 輸入手段,係用以輸入語音;和 取得手段,係用以取得對前記輸入手段所輸入之語音 的辨識結果;和 指定手段,係用以於前記取得手段所辨識之辨識結果 中,指定出有發生辨識錯誤的錯誤區間;和 通知手段,係藉由將已被前記指定手段所指定之錯誤 區間通知給外部伺服器,以向前記外部伺服器請求該當錯 誤區間的再辨識處理;和 收訊手段,係用以接收,回應於前記通知手段所作之 請求而於前記外部伺服器中所再辨識而成之錯誤區間的辨 識結果。 23.—種語音辨識結果訂正方法,其特徵爲,具命: 輸入步驟,係用以輸入語音;和 算出步驟,係用以基於被前記輸入步驟所輸入之語音 ,而算出特徵量資料;和 記憶步驟,係用以記憶被前記算出步驟所算出之特徵 量資料;和 -82- 200951940 取得步驟,係用以取得對前記輸入步驟所輸入之語音 的辨識結果;和 指定步驟,係用以於前記取得步驟所辨識之辨識結果 中,指定出有發生辨識錯誤的錯誤區間;和 訂正步驟,係用以從前記記憶步驟中所記憶之特徵量 資料,抽出已被前記指定手段所指定之錯誤區間所對應之 特徵量資料,並使用該當已抽出之特徵量資料來進行再辨 φ 識,藉此以執行前記取得步驟所得到之辨識結果的訂正。 24. —種語音辨識結果訂正方法,其特徵爲,具備: 輸入步驟,係用以輸入語音;和 取得步驟,係用以取得對前記輸入步驟所輸入之語音 的辨識結果;和 指定步驟,係用以於前記取得步驟所辨識之辨識結果 中,指定出有發生辨識錯誤的錯誤區間;和 通知步驟,係藉由將已被前記指定步驟所指定之錯誤 Q 區間通知給外部伺服器,以向前記外部伺服器請求該當錯 誤區間的再辨識處理;和 收訊步驟,係用以接收,回應於前記通知步驟所作之 請求而於前記外部伺服器中所再辨識而成之錯誤區間的辨 識結果。 25. 如申請專利範圍第1項至第22項之任一項所記載 之語音辨識結果訂正裝置,其中, 具備:詞根區間指定手段,係用以於前記取得手段所 取得到的辨識結果中,指定詞根區間; -83 - 200951940 前記訂正手段’係於前記指定手段所指定之錯誤區間 中’再將前記詞根區間指定手段所指定之詞根區間所對應 的特徵量資料,從前記記憶手段中抽出,使用該當已抽出 之特徵量資料來進行再辨識,藉此以執行前記取得手段所 得到之辨識結果的訂正。 26.如申請專利範圍第25項所記載之語音辨識結果訂 正裝置,其中, 更具備:分割手段,係依照前記詞根區間指定手段所 指定的詞根區間,而將從前記取得手段所取得到的辨識結 果,分割成複數區間; 前記訂正手段,係對前記分割手段所分割出來的每一 分割區間,執行辨識結果的訂正。 27_如申請專利範圍第26項所記載之語音辨識結果訂 正裝置’其中,前記分割手段,係將詞根區間的終點視爲 一分割區間的終點,並且將詞根區間的起點視爲前記一分 割區間的下一分割區間的起點,以此方式來分割辨識結果 〇 28. 如申請專利範圍第27項所記載之語音辨識結果訂 正裝置,其中,前記訂正手段,係對前記分割手段所分割 出來的每一分割區間,執行辨識結果的訂正,並且將前記 詞根區間,視爲各分割區間之訂正時的拘束條件。 29. 如申請專利範圍第25項所記載之語音辨識結果訂 正裝置,其中,前記訂正手段,係將前記詞根區間指定手 段所指定之詞根區間中所描述之詞根字串加以含有的假說 -84- 200951940 ,當作辨識的探索過程而予以保持,從該當假說中選擇出 最終的辨識結果,以執行訂正。 3 0.如申請專利範圍第25項至第29項之任1項所記 載之語音辨識結果訂正裝置,其中,更具備:字典追加手 段’係用以將前記詞根區間指定手段所指定之詞根區間中 的詞根字串’追加至辨識處理所需之字典資料庫中。 31. 如申請專利範圍第25項至第30項之任一項所記 φ 載之語音辨識結果訂正裝置,其中, 更具備:由使用者所生成之字典資料庫; 前記訂正手段,係使用將詞根字串依照前記字典資料 庫所轉換過的字串,來進行訂正處理。 32. —種語音辨識結果訂正系統,其特徵爲,是由以 下所構成: 如申請專利範圍第1項至第22項、或第25項至第31 項之任1項所記載之語音辨識結果訂正裝置;和 〇 伺服器裝置,係基於從前記語音辨識結果訂正裝置所 發送來的語音而進行語音辨識,並作成辨識結果而發送至 前記語音辨識結果訂正裝置。 -85-200951940 VII. Patent application scope: 1. A speech recognition result correction device, characterized in that: an input means for inputting a voice; and a calculation means for calculating based on a voice input by a pre-record input means The feature quantity data; and the memory means are used to memorize the feature quantity data calculated by the pre-calculation means; and the acquisition means is used to obtain the recognition result of the voice input by the pre-record input means; and the means for specifying ' In the identification result identified by the means for obtaining the pre-recording, an error interval in which a recognition error occurs is specified; and the correction means 'is used to extract the error specified by the pre-recording means from the feature quantity data memorized in the memory means previously recorded' The feature quantity data corresponding to the interval is used for re-identification using the extracted feature quantity data, thereby correcting the identification result obtained by performing the pre-recording obtaining means. 2. The speech recognition result correction device according to the first aspect of the patent application, wherein the pre-recording acquisition means is configured as follows: the transmission means is for transmitting the voice input by the pre-recording means to the speech recognition The device and the receiving means are configured to receive the identification result recognized by the pre-recorded speech recognition device; the pre-recording means is to specify an error interval in which the identification error occurs in the identification result received by the pre-receiving means . -76- 200951940 3 - The speech recognition result correcting device described in the first or second aspect of the patent application, wherein the pre-recording means specifies the error interval by accepting the user's operation. 4. The speech recognition result correction device according to any one of claims 1 to 3, wherein the pre-recording means determines the error based on the reliability of the identification result given in the pre-recording result. Interval, and specify the error interval that should be judged. @ 5. The speech recognition result correction device according to any one of the first to third aspects of the patent application, wherein the pre-recording means calculates the reliability of the pre-recorded identification result, and determines the error interval based on the reliability. And specify the error interval that should be judged. 6. The speech recognition result correction device according to any one of the first to fifth aspects of the patent application, wherein the method further includes: a specific means for specifying the front of the error interval specified by the pre-recording means At least one vocabulary, or at least one vocabulary at the back, or one of the two words of the front vocabulary and the vocabulary of the front; the pre-script correction means is the identification result that has been specified by the pre-recorded specific means. According to the restraint condition, the feature quantity data corresponding to the section including the square suffix and the rear vocabulary before the error interval is extracted from the previous memory means, and the extracted feature quantity data is subjected to identification processing. 7. The speech recognition result correction device described in any one of items 1 to 5 of the patent application, wherein -77-200951940 is more suitable. The specific means is used to specify the specific designation specified by the pre-recording means. At least one vocabulary in front of the error interval, or at least one vocabulary in the back, or the front vocabulary and the front The identification result formed by either of the two parties; the pre-editing means is to regard the identification result specified by the specific means of the pre-recording as the constraint condition, and according to the constraint condition, the feature quantity data corresponding to the error interval, It is extracted from the memory of the previous memory, and the extracted feature data is identified. 8. The speech recognition result correction device described in any one of items 1 to 7 of the patent application, wherein the vocabulary information specific means is used to specify: an error to be specified by the pre-recording means At least one vocabulary in front of the interval is given by any one of the required information, that is, vocabulary information, or vocabulary information of at least one vocabulary at the back, or vocabulary information of the preceding vocabulary and vocabulary information of the latter vocabulary. The vocabulary information of the vocabulary in the formed identification result; the pre-editing means is to regard the vocabulary information specified by the specific means of the vocabulary information as the constraint condition, and according to the constraint condition, the vocabulary of the error interval and the rear vocabulary are added. The feature quantity data corresponding to the included section is extracted from the previous memory means, and the extracted feature quantity data is subjected to identification processing. 9. The speech recognition result correction device as set forth in claim 8 wherein the pre-recorded vocabulary information includes: part of speech information indicating the vocabulary of the vocabulary, and pronunciation information indicating the vocabulary of the vocabulary. Complex -78- 200951940 Number. 1 〇· As for the speech recognition result correction device described in item 8 or item 9 of the patent application, the method further includes: an unknown word determination means, which is determined based on the pre-recorded vocabulary information, and the error interval specified by the pre-recording means Whether at least one vocabulary in front, or at least one vocabulary in the back, or a vocabulary of the identification result formed by either of the front vocabulary and the vocabulary of the front, whether @ is an unknown word; The means for determining the vocabulary of the pre-recording result is an unknown word, and the pre-editing means is based on the previous vocabulary information to perform the correction processing of the identification result. 11. The speech recognition result correction device according to any one of the first to tenth aspects of the patent application, wherein the method further comprises: a connection probability memory means for memorizing the connection probability of the vocabulary; Based on the fact that the correction process has been performed, the probability of the connection between the vocabulary of the error interval and the vocabulary of the preceding or the other side is made, and the connection probability is used to update the connection probability stored in the pre-recorded probability memory means. 12. The speech recognition result correction device according to any one of the sixth to eleventh aspect of the patent application, wherein the method further includes: a constraint condition memory means for specifying a pre-recorded information specific means The vocabulary information or the vocabulary specified by the pre-recorded specific means is memorized as a constraint; -79- 200951940 The pre-editing method is based on the restraint conditions stored in the pre-restricted conditional memory means. 13. The speech recognition result correction device according to any one of claims 1 to 12, further comprising: means for accepting text information from a user; The text information received by the pre-recording acceptance means is regarded as a constraint condition, and the correction processing of the identification result in the error section is performed. Λ 〇 14. The speech recognition result correction device described in any one of items 1 to 13 of the patent application, further comprising: a time information calculation means for identifying based on the receiving means The result and the feature quantity data memorized in the memory means are used to calculate the elapsed time of the identification result. The pre-recording means specifies the error section based on the time information calculated by the previous time information calculation means. 15. The speech recognition result correction device of claim 0, wherein the display means is further configured to display the identification result that has been corrected by the pre-recording means. ; The pre-recording means does not display the identification results obtained by the means before the #. 16. The speech recognition result correction device as recited in claim 15 wherein the identification result obtained by the re-identification of the current recording positive means and the identification result obtained by the pre-acquisition obtaining means are the same 'or-80 - 200951940 If the time information contained in these identification results is different, then it is judged as an identification error, and the pre-recording means does not display the identification result. 17. The speech recognition result correction device according to claim 3, wherein the pre-recording specifying means specifies the starting point of the error interval by the user operation, and the identification result obtained based on the pre-recording obtaining means is The reliability of the identification result is given to specify the end point of the error interval 〇 φ 18. The speech recognition result correction device described in claim 3, wherein the pre-recording means specifies the error interval by the user operation. The starting point is the end point of the specified error interval from the starting point away from the specified number of identification units. The voice recognition result correction device described in claim 3, wherein the pre-recording means specifies the starting point of the error interval by the user operation, and the recognition result obtained based on the pre-recording means The pronunciation symbol is determined to specify the end of the error interval. Φ 20. The speech recognition result correction device according to the third aspect of the patent application, wherein the pre-recording obtaining means obtains the plural identification candidate as the identification result when obtaining the identification result; the pre-recording means is by using The operator specifies the starting point of the error interval, and specifies the end point based on the number of identification candidates obtained by the pre-recording means. 2 1. The speech recognition result correction device according to any one of the first to the twenty-secondth aspects of the patent application, wherein -81 - 200951940 further includes: a calculation means for calculating, which has been calculated by the pre-recording means The average value of the interval included in the error interval of the calculated feature quantity data is the correction method of the feature quantity data that has been extracted, minus the average value calculated by the calculation method of the previous calculation, and the data obtained by the subtraction is regarded as a feature. The data is re-identified. 2 2. A speech recognition result correction device, characterized in that: an input means for inputting a voice; and an acquisition means for obtaining a recognition result of a voice input by a pre-record input means; and a means for specifying For identifying the error interval in which the identification error occurs in the identification result identified by the pre-recording means, and the notification means, by notifying the external server of the error interval specified by the pre-recording means Forwarding the external server to request the re-identification process of the error interval; and the receiving means for receiving the identification result of the error interval re-identified in the external server in response to the request made by the pre-notification means . 23. A speech recognition result correction method, characterized in that: a life step: an input step for inputting a voice; and a calculation step for calculating a feature quantity data based on a voice input by a pre-record input step; and The memory step is for memorizing the feature quantity data calculated by the pre-calculation step; and the -82-200951940 obtaining step is for obtaining the identification result of the voice input in the pre-recording input step; and the specifying step is used for In the identification result identified by the pre-acquisition obtaining step, an error interval in which a recognition error occurs is specified; and a correction step is used to extract the error interval specified by the pre-recording specifying means from the feature quantity data memorized in the previous memory step. The corresponding feature quantity data is used, and the extracted feature quantity data is used to perform the recognition, thereby performing the correction of the identification result obtained by the pre-recording step. 24. A speech recognition result correction method, comprising: an input step for inputting a voice; and an obtaining step for obtaining an identification result of a voice input to a pre-record input step; and a specifying step In the identification result identified by the pre-acquisition obtaining step, an error interval in which a recognition error occurs is specified; and the notifying step is to notify the external server by the error Q interval specified by the pre-recording step. The pre-recording external server requests the re-identification process of the error interval; and the receiving step is for receiving the identification result of the error interval re-identified in the external server in response to the request made by the pre-recording step. The speech recognition result correction device according to any one of the above-mentioned claims, wherein the vocal interval specifying means is used for the recognition result obtained by the pre-recording means, Specifying the root interval; -83 - 200951940 The pre-requisite means 'in the error interval specified by the pre-recording means', and then extracting the feature quantity data corresponding to the root interval specified by the pre-requisite interval designation means, extracting from the pre-recorded memory means, The re-identification is performed using the extracted feature amount data, thereby performing correction of the identification result obtained by the pre-recording obtaining means. 26. The speech recognition result correction device according to claim 25, wherein the segmentation means further comprises: the segmentation means according to the root segment specified by the preceding root segment specifying means, and the identification obtained from the pre-recording means As a result, the segmentation is divided into complex intervals; the pre-correction means performs correction of the identification result for each segmentation segment divided by the pre-recording means. 27_ The speech recognition result correcting device described in claim 26, wherein the pre-recording means regards the end point of the root interval as the end point of a divided interval, and regards the starting point of the root interval as the pre-recording interval The starting point of the next divided section, the identification result is segmented in this way. 28. The speech recognition result correcting apparatus described in claim 27, wherein the pre-correcting means is each divided by the pre-recording means A segmentation interval is performed to correct the recognition result, and the preceding root segment is regarded as a constraint condition at the time of correction of each segmentation interval. 29. The speech recognition result correction device as recited in claim 25, wherein the pre-correction means is a hypothesis containing the root string described in the root interval specified by the pre-requisite interval specifying means-84- 200951940, which is maintained as an identification exploration process, and the final identification result is selected from the hypothesis to perform the revision. 3. The speech recognition result correction device according to any one of claims 25 to 29, wherein the dictionary addition means is configured to use the root interval specified by the predicate interval specifying means. The root string in 'appends to the dictionary database required for the recognition process. 31. The speech recognition result correction device of φ as set forth in any one of the 25th to 30th patent applications, wherein the method further includes: a dictionary database generated by the user; the pre-correction means is used The root string is corrected according to the string converted by the pre-dictionary dictionary database. 32. A speech recognition result correction system, which is characterized by the following: a speech recognition result as recited in claim 1 to item 22, or item 25 to item 31 The correcting device and the server device perform voice recognition based on the voice transmitted from the voice recognition result correcting device, and generate a recognition result to transmit to the preceding voice recognition result correcting device. -85-
TW098113352A 2008-04-22 2009-04-22 A speech recognition result correction device and a speech recognition result correction method, and a speech recognition result correction system TWI427620B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
JP2008111540 2008-04-22
JP2008198486 2008-07-31
JP2008285550A JP4709887B2 (en) 2008-04-22 2008-11-06 Speech recognition result correction apparatus, speech recognition result correction method, and speech recognition result correction system

Publications (2)

Publication Number Publication Date
TW200951940A true TW200951940A (en) 2009-12-16
TWI427620B TWI427620B (en) 2014-02-21

Family

ID=42070988

Family Applications (1)

Application Number Title Priority Date Filing Date
TW098113352A TWI427620B (en) 2008-04-22 2009-04-22 A speech recognition result correction device and a speech recognition result correction method, and a speech recognition result correction system

Country Status (3)

Country Link
JP (1) JP4709887B2 (en)
CN (1) CN101567189B (en)
TW (1) TWI427620B (en)

Families Citing this family (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5004863B2 (en) * 2008-04-30 2012-08-22 三菱電機株式会社 Voice search apparatus and voice search method
JP5231484B2 (en) * 2010-05-19 2013-07-10 ヤフー株式会社 Voice recognition apparatus, voice recognition method, program, and information processing apparatus for distributing program
JP5160594B2 (en) * 2010-06-17 2013-03-13 株式会社エヌ・ティ・ティ・ドコモ Speech recognition apparatus and speech recognition method
JP5480760B2 (en) * 2010-09-15 2014-04-23 株式会社Nttドコモ Terminal device, voice recognition method and voice recognition program
US20130158999A1 (en) * 2010-11-30 2013-06-20 Mitsubishi Electric Corporation Voice recognition apparatus and navigation system
JP6150268B2 (en) * 2012-08-31 2017-06-21 国立研究開発法人情報通信研究機構 Word registration apparatus and computer program therefor
KR101364774B1 (en) * 2012-12-07 2014-02-20 포항공과대학교 산학협력단 Method for correction error of speech recognition and apparatus
CN103076893B (en) * 2012-12-31 2016-08-17 百度在线网络技术(北京)有限公司 A kind of method and apparatus for realizing phonetic entry
JP2014137430A (en) * 2013-01-16 2014-07-28 Sharp Corp Electronic apparatus and cleaner
TWI508057B (en) * 2013-07-15 2015-11-11 Chunghwa Picture Tubes Ltd Speech recognition system and method
CN104978965B (en) 2014-04-07 2019-04-26 三星电子株式会社 The speech recognition of electronic device and utilization electronic device and server executes method
CN105469801B (en) * 2014-09-11 2019-07-12 阿里巴巴集团控股有限公司 A kind of method and device thereof for repairing input voice
CN105869632A (en) * 2015-01-22 2016-08-17 北京三星通信技术研究有限公司 Speech recognition-based text revision method and device
CN104933408B (en) * 2015-06-09 2019-04-05 深圳先进技术研究院 The method and system of gesture identification
CN105513586A (en) * 2015-12-18 2016-04-20 百度在线网络技术(北京)有限公司 Speech recognition result display method and speech recognition result display device
KR101804765B1 (en) * 2016-01-08 2018-01-10 현대자동차주식회사 Vehicle and control method for the same
JP6675078B2 (en) * 2016-03-15 2020-04-01 パナソニックIpマネジメント株式会社 Misrecognition and correction method, misrecognition and correction device, and misrecognition and correction program
JP7014163B2 (en) 2016-07-19 2022-02-01 ソニーグループ株式会社 Information processing equipment and information processing method
JP6526608B2 (en) * 2016-09-06 2019-06-05 株式会社東芝 Dictionary update device and program
JP6597527B2 (en) * 2016-09-06 2019-10-30 トヨタ自動車株式会社 Speech recognition apparatus and speech recognition method
JP7088645B2 (en) * 2017-09-20 2022-06-21 株式会社野村総合研究所 Data converter
CN107945802A (en) * 2017-10-23 2018-04-20 北京云知声信息技术有限公司 Voice recognition result processing method and processing device
CN108597495B (en) * 2018-03-15 2020-04-14 维沃移动通信有限公司 Method and device for processing voice data
JP7143665B2 (en) * 2018-07-27 2022-09-29 富士通株式会社 Speech recognition device, speech recognition program and speech recognition method
CN109325239A (en) * 2018-11-05 2019-02-12 北京智启蓝墨信息技术有限公司 Student classroom expression mannage method and system
CN110956959B (en) * 2019-11-25 2023-07-25 科大讯飞股份有限公司 Speech recognition error correction method, related device and readable storage medium
CN111192586B (en) * 2020-01-08 2023-07-04 北京小米松果电子有限公司 Speech recognition method and device, electronic equipment and storage medium
CN112382285B (en) 2020-11-03 2023-08-15 北京百度网讯科技有限公司 Voice control method, voice control device, electronic equipment and storage medium
CN112951238A (en) * 2021-03-19 2021-06-11 河南蜂云科技发展有限公司 Scientific and technological court intelligent management method, system and storage medium based on voice processing
JP2023007960A (en) * 2021-07-02 2023-01-19 株式会社アドバンスト・メディア Information processing device, information processing system, information processing method, and program
CN116894442B (en) * 2023-09-11 2023-12-05 临沂大学 Language translation method and system for correcting guide pronunciation

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW219993B (en) * 1992-05-21 1994-02-01 Ind Tech Res Inst Speech recognition system
US6163765A (en) * 1998-03-30 2000-12-19 Motorola, Inc. Subband normalization, transformation, and voiceness to recognize phonemes for text messaging in a radio communication system
JP2000056795A (en) * 1998-08-03 2000-02-25 Fuji Xerox Co Ltd Speech recognition device
JP3111997B2 (en) * 1998-09-04 2000-11-27 三菱電機株式会社 Speech recognition system and word dictionary creation device
US7881936B2 (en) * 1998-12-04 2011-02-01 Tegic Communications, Inc. Multimodal disambiguation of speech recognition
JP3976959B2 (en) * 1999-09-24 2007-09-19 三菱電機株式会社 Speech recognition apparatus, speech recognition method, and speech recognition program recording medium
EP1407447A1 (en) * 2001-07-06 2004-04-14 Koninklijke Philips Electronics N.V. Fast search in speech recognition
JP4797307B2 (en) * 2001-09-21 2011-10-19 日本電気株式会社 Speech recognition apparatus and speech recognition method
JP4171323B2 (en) * 2003-02-27 2008-10-22 日本電信電話株式会社 Recognition error correction method, apparatus, and program
JP4347716B2 (en) * 2004-02-18 2009-10-21 株式会社エヌ・ティ・ティ・ドコモ Speech recognition server, speech input system, and speech input method
JP4736478B2 (en) * 2005-03-07 2011-07-27 日本電気株式会社 Voice transcription support device, method and program thereof

Also Published As

Publication number Publication date
JP4709887B2 (en) 2011-06-29
JP2010055044A (en) 2010-03-11
CN101567189B (en) 2012-04-25
TWI427620B (en) 2014-02-21
CN101567189A (en) 2009-10-28

Similar Documents

Publication Publication Date Title
TWI427620B (en) A speech recognition result correction device and a speech recognition result correction method, and a speech recognition result correction system
JP4816409B2 (en) Recognition dictionary system and updating method thereof
US7650283B2 (en) Dialogue supporting apparatus
EP2560158B1 (en) Operating system and method of operating
CN106663424B (en) Intention understanding device and method
US8275618B2 (en) Mobile dictation correction user interface
US9589562B2 (en) Pronunciation learning through correction logs
US6681206B1 (en) Method for generating morphemes
US20060149551A1 (en) Mobile dictation correction user interface
US7805304B2 (en) Speech recognition apparatus for determining final word from recognition candidate word sequence corresponding to voice data
US6192337B1 (en) Apparatus and methods for rejecting confusible words during training associated with a speech recognition system
US20130289993A1 (en) Speak and touch auto correction interface
US20150088506A1 (en) Speech Recognition Server Integration Device and Speech Recognition Server Integration Method
JP5824829B2 (en) Speech recognition apparatus, speech recognition method, and speech recognition program
US20070038453A1 (en) Speech recognition system
CN106713111B (en) Processing method for adding friends, terminal and server
US20080065371A1 (en) Conversation System and Conversation Software
WO2003088080A1 (en) Method and system for detecting and extracting named entities from spontaneous communications
Rose et al. Integration of utterance verification with statistical language modeling and spoken language understanding
JP2010048890A (en) Client device, recognition result feedback method, recognition result feedback program, server device, method and program of updating model of voice recognition, voice recognition system, voice recognition method, voice recognition program
EP1887562A1 (en) Speech recognition by statistical language model using square-root smoothing
JP6233867B2 (en) Dictionary registration system for speech recognition, speech recognition system, speech recognition service system, method and program
JP5238395B2 (en) Language model creation apparatus and language model creation method
Tetariy et al. Cross-language phoneme mapping for phonetic search keyword spotting in continuous speech of under-resourced languages.
CN113822029A (en) Customer service assistance method, device and system

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees