JP6222970B2

JP6222970B2 - Speech recognition apparatus and speech recognition result determination method

Info

Publication number: JP6222970B2
Application number: JP2013084738A
Authority: JP
Inventors: 伸弘新開; 大川　克利; 克利大川
Original assignee: Advanced Media Inc
Current assignee: Advanced Media Inc
Priority date: 2013-04-15
Filing date: 2013-04-15
Publication date: 2017-11-01
Anticipated expiration: 2033-04-15
Also published as: JP2014206677A

Description

本発明は、音声認識処理を行う音声認識装置、および、この装置において用いられる音声認識結果確定方法に関する。 The present invention relates to a speech recognition apparatus that performs speech recognition processing and a speech recognition result determination method used in the apparatus.

音声認識技術を用いてテキスト入力を音声入力により行うこと（以下「音声テキスト入力」という）が、広く行われている。近年、用途の拡大に伴い、音声テキスト入力の高速化および高精度化が求められている。 Performing text input by speech input using speech recognition technology (hereinafter referred to as “speech text input”) is widely performed. In recent years, with the expansion of applications, higher speed and higher accuracy of speech text input are required.

特許文献１に記載の技術（以下「従来技術」という）は、文を単語毎に区切り、２段階の言語モデルおよび音響モデルを用いて、文頭から（前方から）音声認識結果を単語単位で順次確定する。このような従来技術を用いることにより、１段階の言語モデルおよび音響モデルを用いて文単位で音声認識結果を確定する場合に比べて、高精度な音声テキスト入力を、より高速に行うことが可能となる。 The technique described in Patent Document 1 (hereinafter referred to as “prior art”) divides a sentence into words and sequentially uses a two-level language model and an acoustic model to sequentially obtain speech recognition results from the beginning of the sentence (from the front) in units of words. Determine. By using such a conventional technology, it is possible to perform highly accurate speech text input at a higher speed than when speech recognition results are determined in sentence units using a one-stage language model and acoustic model. It becomes.

特開２００３−１４０６８５号公報Japanese Patent Laid-Open No. 2003-140685

しかしながら、従来技術は、比較的長い語句に対しては、高精度かつ高速な音声テキスト入力を実現することが難しいという課題を有する。 However, the prior art has a problem that it is difficult to realize high-accuracy and high-speed speech text input for relatively long words.

理由は、以下の通りである。長い語句は、発話の開始から完了までに時間が掛かるだけでなく、音声認識処理の開始から完了までに時間が掛かる。したがって、比較的長い語句を入力しようとする場合、発話が開始されてから音声認識結果が確定するまでには、比較的長い時間を要する。 The reason is as follows. Long words not only take time from the start to the completion of the utterance, but also take time from the start to the completion of the speech recognition process. Therefore, when a relatively long word is to be input, it takes a relatively long time from the start of utterance until the speech recognition result is confirmed.

本発明の目的は、比較的長い語句に対しても、高精度かつ高速な音声テキスト入力を実現することである。 An object of the present invention is to realize high-accuracy and high-speed speech text input even for relatively long words.

本発明の第一形態に係る音声認識装置は、複数のテキスト配列を記述した辞書を格納する音声認識データベースと、音声を入力する音声入力部と、前記音声のうち既に入力されている音声部分に対して音声認識を行って、該音声認識の結果を暫定の音声認識結果とする音声認識処理部と、前記暫定の音声認識結果が、前方に前記複数のテキスト配列の間で一意となる排他区間を有する排他テキスト配列であるとき、当該排他テキスト配列を前記音声に対する確定の音声認識結果とする確定処理部と、を有し、前記音声認識処理部は、前記音声認識を所定の周期で繰り返し行い、前記確定処理部は、前記暫定の音声認識結果が前記排他テキスト配列であり、かつ、直近に連続して得られた所定数の他の前記暫定の音声認識結果の全てと前方一致となることを条件として、当該排他テキスト配列を前記確定の音声認識結果とし、前記所定数は、２である。
本発明の第二形態に係る音声認識装置は、複数のテキスト配列を記述した辞書を格納する音声認識データベースと、音声を入力する音声入力部と、前記音声のうち既に入力されている音声部分に対して音声認識を行って、該音声認識の結果を暫定の音声認識結果とする音声認識処理部と、前記暫定の音声認識結果が、前方に前記複数のテキスト配列の間で一意となる排他区間を有する排他テキスト配列であるとき、当該排他テキスト配列を前記音声に対する確定の音声認識結果とする確定処理部と、を有し、前記音声認識処理部は、前記音声認識を所定の周期で繰り返し行い、前記確定処理部は、前記暫定の音声認識結果が前記排他テキスト配列であり、かつ、直近に連続して得られた所定数の他の前記暫定の音声認識結果の全てと前方一致となることを条件として、当該排他テキスト配列を前記確定の音声認識結果とし、前記テキスト配列は、音節テキスト配列であり、前記確定処理部はさらに、前記所定数以上の音節で前記排他テキスト配列と前方一致とならないテキスト配列は、前記条件の判断の対象外とする。
本発明の第三形態に係る音声認識装置は、複数のテキスト配列を記述した辞書を格納する音声認識データベースと、音声を入力する音声入力部と、前記音声のうち既に入力されている音声部分に対して音声認識を行って、該音声認識の結果を暫定の音声認識結果とする音声認識処理部と、前記暫定の音声認識結果が、前方に前記複数のテキスト配列の間で一意となる排他区間を有する排他テキスト配列であるとき、当該排他テキスト配列を前記音声に対する確定の音声認識結果とする確定処理部と、を有し、前記音声認識処理部は、前記音声認識を所定の周期で繰り返し行い、前記確定処理部は、前記暫定の音声認識結果が前記排他テキスト配列であり、かつ、直近に連続して得られた所定数の他の前記暫定の音声認識結果の全てと前方一致となることを条件として、当該排他テキスト配列を前記確定の音声認識結果とする、前記排他テキスト配列は、前記排他区間の末尾位置を示すマーカが挿入された音節テキスト配列であり、前記複数のテキスト配列のうち、前記排他テキスト配列以外のテキスト配列は、当該テキスト配列の末尾位置を示すマーカが挿入された音節テキスト配列であり、前記確定処理部は、前記暫定の音声認識結果が、前記マーカが挿入されていないものであるとき、当該暫定の音声認識結果については、前記条件の判断の対象外とする。
本発明の第四形態に係る音声認識装置は、複数のテキスト配列を記述した辞書を格納する音声認識データベースと、音声を入力する音声入力部と、前記音声のうち既に入力されている音声部分に対して音声認識を行って、該音声認識の結果を暫定の音声認識結果とする音声認識処理部と、前記暫定の音声認識結果が、前方に前記複数のテキスト配列の間で一意となる排他区間を有する排他テキスト配列であるとき、当該排他テキスト配列を前記音声に対する確定の音声認識結果とする確定処理部と、を有する音声認識装置であって、前記音声認識処理部は、前記音声認識を所定の周期で繰り返し行い、前記確定処理部は、前記暫定の音声認識結果が前記排他テキスト配列であり、かつ、直近に連続して得られた所定数の他の前記暫定の音声認識結果の全てと前方一致となることを条件として、当該排他テキスト配列を前記確定の音声認識結果とし、前記音声認識装置は、前記辞書に記述される複数のテキスト配列を、前方一致の順序でソートし、隣接する前記テキスト配列のペアのそれぞれに対して前方一致の範囲を判定する処理を行うことにより、前記排他テキスト配列にマーカを挿入するマーカ挿入部、を更に有する。 The speech recognition apparatus according to the first aspect of the present invention includes a speech recognition database that stores a dictionary describing a plurality of text arrays, a speech input unit that inputs speech, and a speech portion that is already input in the speech. A speech recognition processing unit that performs speech recognition and uses the speech recognition result as a provisional speech recognition result, and an exclusive section in which the provisional speech recognition result is unique among the plurality of text arrays ahead when a exclusive text array with, possess a deterministic processing unit the exclusive text sequences and the speech recognition result of confirmation for the voice, and the voice recognition processing unit repeats the speech recognition in a predetermined cycle performed The confirmation processing unit is forward-matching with all of the predetermined number of the other provisional speech recognition results obtained in the most recent consecutive manner, and the provisional speech recognition result is the exclusive text array. Condition to become, the exclusive text sequences and the speech recognition result of the confirmation, the predetermined number is two.
The speech recognition apparatus according to the second aspect of the present invention includes a speech recognition database that stores a dictionary describing a plurality of text arrays, a speech input unit that inputs speech, and a speech portion that is already input in the speech. A speech recognition processing unit that performs speech recognition and uses the speech recognition result as a provisional speech recognition result, and an exclusive section in which the provisional speech recognition result is unique among the plurality of text arrays ahead A determination processing unit that uses the exclusive text array as a final speech recognition result for the speech, and the speech recognition processing unit repeatedly performs the speech recognition in a predetermined cycle. The confirmation processing unit is forward-matching with all of the predetermined number of the other provisional speech recognition results obtained in the most recent consecutive manner, and the provisional speech recognition result is the exclusive text array. On the condition that the exclusive text array is the confirmed speech recognition result, the text array is a syllable text array, and the confirmation processing unit further includes the exclusive text array and the front in the predetermined number or more of syllables. Text arrays that do not match are not subject to the above conditions .
The speech recognition apparatus according to the third aspect of the present invention includes a speech recognition database that stores a dictionary describing a plurality of text arrays, a speech input unit that inputs speech, and a speech portion that is already input in the speech. A speech recognition processing unit that performs speech recognition and uses the speech recognition result as a provisional speech recognition result, and an exclusive section in which the provisional speech recognition result is unique among the plurality of text arrays ahead A determination processing unit that uses the exclusive text array as a final speech recognition result for the speech, and the speech recognition processing unit repeatedly performs the speech recognition in a predetermined cycle. The confirmation processing unit is forward-matching with all of the predetermined number of the other provisional speech recognition results obtained in the most recent consecutive manner, and the provisional speech recognition result is the exclusive text array. The exclusive text array is the final speech recognition result, the exclusive text array is a syllable text array in which a marker indicating the end position of the exclusive section is inserted, and the plurality of text arrays Among them, the text array other than the exclusive text array is a syllable text array in which a marker indicating the end position of the text array is inserted, and the confirmation processing unit inserts the provisional speech recognition result into the marker inserted. If not, the provisional speech recognition result is not subject to the judgment of the condition .
A speech recognition apparatus according to a fourth aspect of the present invention includes a speech recognition database that stores a dictionary describing a plurality of text arrays, a speech input unit that inputs speech, and a speech portion that is already input in the speech. A speech recognition processing unit that performs speech recognition and uses the speech recognition result as a provisional speech recognition result, and an exclusive section in which the provisional speech recognition result is unique among the plurality of text arrays ahead A speech recognition apparatus having the exclusive text array as a final speech recognition result for the speech, wherein the speech recognition processing unit performs the speech recognition on a predetermined basis. The determination processing unit repeats the provisional speech recognition result in the exclusive text array, and the predetermined number of other provisional speech recognitions obtained continuously in the latest. The exclusive text array is used as the confirmed speech recognition result on the condition that all of the results are in front match, and the speech recognition apparatus sorts the plurality of text arrays described in the dictionary in the order of front match. And a marker insertion unit that inserts a marker into the exclusive text array by performing a process of determining a front matching range for each of the adjacent text array pairs .

本発明の第五形態に係る音声認識結果確定方法は、複数のテキスト配列を記述した辞書を格納する音声認識データベースと、音声を入力する音声入力部と、前記音声のうち既に入力されている音声部分に対して音声認識を行って、該音声認識の結果を暫定の音声認識結果とする音声認識処理部と、を有する音声認識装置における音声認識結果確定方法であって、前記暫定の音声認識結果が、前方に前記複数のテキスト配列の間で一意となる排他区間を有する排他テキスト配列であるか否かを判断するステップと、前記暫定の音声認識結果が前記排他テキスト配列であるとき、当該排他テキスト配列を前記音声に対する確定の音声認識結果とするステップと、を有し、前記音声認識処理部は、前記音声認識を所定の周期で繰り返し行い、前記確定の音声認識結果とするステップは、前記暫定の音声認識結果が前記排他テキスト配列であり、かつ、直近に連続して得られた所定数の他の前記暫定の音声認識結果の全てと前方一致となることを条件として、当該排他テキスト配列を前記確定の音声認識結果とし、前記所定数は、２である。
本発明の第六形態に係る音声認識結果確定方法は、複数のテキスト配列を記述した辞書を格納する音声認識データベースと、音声を入力する音声入力部と、前記音声のうち既に入力されている音声部分に対して音声認識を行って、該音声認識の結果を暫定の音声認識結果とする音声認識処理部と、を有する音声認識装置における音声認識結果確定方法であって、前記暫定の音声認識結果が、前方に前記複数のテキスト配列の間で一意となる排他区間を有する排他テキスト配列であるか否かを判断するステップと、前記暫定の音声認識結果が前記排他テキスト配列であるとき、当該排他テキスト配列を前記音声に対する確定の音声認識結果とするステップと、を有し、前記音声認識処理部は、前記音声認識を所定の周期で繰り返し行い、前記確定の音声認識結果とするステップは、前記暫定の音声認識結果が前記排他テキスト配列であり、かつ、直近に連続して得られた所定数の他の前記暫定の音声認識結果の全てと前方一致となることを条件として、当該排他テキスト配列を前記確定の音声認識結果とし、前記テキスト配列は、音節テキスト配列であり、前記確定の音声認識結果とするステップは、前記所定数以上の音節で前記排他テキスト配列と前方一致とならないテキスト配列は、前記条件の判断の対象外とする。
本発明の第七形態に係る音声認識結果確定方法は、複数のテキスト配列を記述した辞書を格納する音声認識データベースと、音声を入力する音声入力部と、前記音声のうち既に入力されている音声部分に対して音声認識を行って、該音声認識の結果を暫定の音声認識結果とする音声認識処理部と、を有する音声認識装置における音声認識結果確定方法であって、前記暫定の音声認識結果が、前方に前記複数のテキスト配列の間で一意となる排他区間を有する排他テキスト配列であるか否かを判断するステップと、前記暫定の音声認識結果が前記排他テキスト配列であるとき、当該排他テキスト配列を前記音声に対する確定の音声認識結果とするステップと、を有し、前記音声認識処理部は、前記音声認識を所定の周期で繰り返し行い、前記確定の音声認識結果とするステップは、前記暫定の音声認識結果が前記排他テキスト配列であり、かつ、直近に連続して得られた所定数の他の前記暫定の音声認識結果の全てと前方一致となることを条件として、当該排他テキスト配列を前記確定の音声認識結果とし、前記排他テキスト配列は、前記排他区間の末尾位置を示すマーカが挿入された音節テキスト配列であり、前記複数のテキスト配列のうち、前記排他テキスト配列以外のテキスト配列は、当該テキスト配列の末尾位置を示すマーカが挿入された音節テキスト配列であり、前記確定の音声認識結果とするステップは、前記暫定の音声認識結果が、前記マーカが挿入されていないものであるとき、当該暫定の音声認識結果については、前記条件の判断の対象外とする。
本発明の第八形態に係る音声認識結果確定方法は、複数のテキスト配列を記述した辞書を格納する音声認識データベースと、音声を入力する音声入力部と、前記音声のうち既に入力されている音声部分に対して音声認識を行って、該音声認識の結果を暫定の音声認識結果とする音声認識処理部と、を有する音声認識装置における音声認識結果確定方法であって、前記暫定の音声認識結果が、前方に前記複数のテキスト配列の間で一意となる排他区間を有する排他テキスト配列であるか否かを判断するステップと、前記暫定の音声認識結果が前記排他テキスト配列であるとき、当該排他テキスト配列を前記音声に対する確定の音声認識結果とするステップと、を有し、前記音声認識処理部は、前記音声認識を所定の周期で繰り返し行い、前記確定の音声認識結果とするステップは、前記暫定の音声認識結果が前記排他テキスト配列であり、かつ、直近に連続して得られた所定数の他の前記暫定の音声認識結果の全てと前方一致となることを条件として、当該排他テキスト配列を前記確定の音声認識結果とし、前記音声認識結果確定方法は、前記辞書に記述される複数のテキスト配列を、前方一致の順序でソートし、隣接する前記テキスト配列のペアのそれぞれに対して前方一致の範囲を判定する処理を行うことにより、前記排他テキスト配列にマーカを挿入するステップ、を更に有する。 The speech recognition result determination method according to the fifth aspect of the present invention includes a speech recognition database that stores a dictionary describing a plurality of text arrays, a speech input unit that inputs speech, and speech that has already been input among the speech. A speech recognition result determination method in a speech recognition apparatus, comprising: a speech recognition processing unit that performs speech recognition on a portion and uses the speech recognition result as a provisional speech recognition result, wherein the provisional speech recognition result Determining whether or not the exclusive text array has an exclusive section that is unique among the plurality of text arrays in front and the provisional speech recognition result is the exclusive text array. It possesses a step of the text sequence and the speech recognition result of confirmation for the voice, and the voice recognition processing unit performs repeatedly the voice recognition at a predetermined cycle, the probability The step of making the speech recognition result of the above-mentioned is that the provisional speech recognition result is the exclusive text array, and a forward coincidence with all of the predetermined number of other provisional speech recognition results obtained consecutively most recently. The exclusive text array is the definitive speech recognition result, and the predetermined number is 2 .
A speech recognition result determination method according to a sixth aspect of the present invention includes a speech recognition database that stores a dictionary describing a plurality of text arrays, a speech input unit that inputs speech, and speech that has already been input among the speech. A speech recognition result determination method in a speech recognition apparatus, comprising: a speech recognition processing unit that performs speech recognition on a portion and uses the speech recognition result as a provisional speech recognition result, wherein the provisional speech recognition result Determining whether or not the exclusive text array has an exclusive section that is unique among the plurality of text arrays in front and the provisional speech recognition result is the exclusive text array. And making the text arrangement a final voice recognition result for the voice, wherein the voice recognition processing unit repeatedly performs the voice recognition at a predetermined cycle, and The step of making the speech recognition result of the above-mentioned is that the provisional speech recognition result is the exclusive text array, and a forward coincidence with all of the predetermined number of other provisional speech recognition results obtained consecutively most recently. The exclusive text array is the definitive speech recognition result, the text array is a syllable text array, and the definite speech recognition result is the exclusive text array with the predetermined number or more of syllables. A text array that does not coincide with the text array is excluded from the judgment of the condition.
The speech recognition result determination method according to the seventh aspect of the present invention includes a speech recognition database that stores a dictionary describing a plurality of text arrays, a speech input unit that inputs speech, and speech that has already been input among the speeches. A speech recognition result determination method in a speech recognition apparatus, comprising: a speech recognition processing unit that performs speech recognition on a portion and uses the speech recognition result as a provisional speech recognition result, wherein the provisional speech recognition result Determining whether or not the exclusive text array has an exclusive section that is unique among the plurality of text arrays in front and the provisional speech recognition result is the exclusive text array. And making the text arrangement a final voice recognition result for the voice, wherein the voice recognition processing unit repeatedly performs the voice recognition at a predetermined cycle, and The step of making the speech recognition result of the above-mentioned is that the provisional speech recognition result is the exclusive text array, and a forward coincidence with all of the predetermined number of other provisional speech recognition results obtained consecutively most recently. On the condition that the exclusive text array is the final speech recognition result, the exclusive text array is a syllable text array in which a marker indicating the end position of the exclusive section is inserted, and the plurality of text arrays Of these, the text array other than the exclusive text array is a syllable text array in which a marker indicating the end position of the text array is inserted, and the step of setting the final speech recognition result includes the provisional speech recognition result, When the marker is not inserted, the provisional speech recognition result is excluded from the judgment of the condition.
A speech recognition result determination method according to an eighth aspect of the present invention includes a speech recognition database that stores a dictionary describing a plurality of text arrays, a speech input unit that inputs speech, and speech that has already been input among the speech. A speech recognition result determination method in a speech recognition apparatus, comprising: a speech recognition processing unit that performs speech recognition on a portion and uses the speech recognition result as a provisional speech recognition result, wherein the provisional speech recognition result Determining whether or not the exclusive text array has an exclusive section that is unique among the plurality of text arrays in front and the provisional speech recognition result is the exclusive text array. And making the text arrangement a final voice recognition result for the voice, wherein the voice recognition processing unit repeatedly performs the voice recognition at a predetermined cycle, and The step of making the speech recognition result of the above-mentioned is that the provisional speech recognition result is the exclusive text array, and a forward coincidence with all of the predetermined number of other provisional speech recognition results obtained consecutively most recently. The exclusive text array as the confirmed speech recognition result, and the speech recognition result determination method sorts a plurality of text arrays described in the dictionary in the order of front matching, The method further includes the step of inserting a marker into the exclusive text array by performing a process of determining a front matching range for each of the text array pairs.

本発明によれば、比較的長い語句に対しても、高精度かつ高速な音声テキスト入力を実現することができる。 According to the present invention, high-accuracy and high-speed speech text input can be realized even for relatively long words.

本発明の一実施の形態に係る音声認識装置の構成の一例を示すブロック図The block diagram which shows an example of a structure of the speech recognition apparatus which concerns on one embodiment of this invention. 本実施の形態におけるマーカ挿入動作の一例を示すフローチャートFlow chart showing an example of marker insertion operation in the present embodiment 本実施の形態におけるマーカ挿入動作前の登録単語群の一例を示す図The figure which shows an example of the registration word group before the marker insertion operation | movement in this Embodiment 本実施の形態における音節行列の一例を示す図The figure which shows an example of the syllable matrix in this Embodiment 本実施の形態における一致文字列リストの一例を示す図The figure which shows an example of the matching character string list | wrist in this Embodiment 本実施の形態におけるマーカ挿入動作後の登録単語群の一例を示す図The figure which shows an example of the registration word group after the marker insertion operation | movement in this Embodiment 本実施の形態における音声認識動作の一例を示すフローチャートFlow chart showing an example of voice recognition operation in the present embodiment 本実施の形態における確定結果取得までに要する時間の一例を説明する図The figure explaining an example of the time required until the decision result acquisition in this Embodiment 本実施の形態における結果確定の様子の一例を示す図The figure which shows an example of the mode of a result decision in this Embodiment 本実施の形態における結果確定の様子の他の例を示す図The figure which shows the other example of the mode of a result decision in this Embodiment.

以下、本発明の一実施の形態について、図面を参照して詳細に説明する。 Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.

まず、本実施の形態に係る音声認識装置の構成について説明する。 First, the configuration of the speech recognition apparatus according to this embodiment will be described.

図１は、本発明の一実施の形態に係る音声認識装置の構成の一例を示すブロック図である。本実施の形態に係る音声認識装置は、例えば、携帯電話機である。 FIG. 1 is a block diagram showing an example of the configuration of a speech recognition apparatus according to an embodiment of the present invention. The voice recognition device according to the present embodiment is, for example, a mobile phone.

図１において、音声認識装置１００は、音声認識データベース（ＤＢ）１１０、マーカ挿入部１２０、音声入力部１３０、音声認識処理部１４０、表示部１５０、操作入力部１６０、確定処理部１７０、および確定結果使用部１８０を有する。 In FIG. 1, a speech recognition apparatus 100 includes a speech recognition database (DB) 110, a marker insertion unit 120, a speech input unit 130, a speech recognition processing unit 140, a display unit 150, an operation input unit 160, a confirmation processing unit 170, and a confirmation. A result use unit 180 is included.

音声認識データベース１１０は、音声認識処理に用いられる情報である、音響モデル１１１、言語モデル１１２、および辞書１１３を、予め格納する。音響モデル１１１は、音声の特徴量と発音記号との確率的な対応付けをデータ化したものである。辞書１１３は、音声認識処理による音声認識結果の候補群として、複数のテキスト配列を記述したものである。言語モデル１１２は、辞書１１３に記述されたテキスト配列のそれぞれについて、出現確率や接続確率をデータ化したものである。 The speech recognition database 110 stores in advance an acoustic model 111, a language model 112, and a dictionary 113, which are information used for speech recognition processing. The acoustic model 111 is obtained by converting a stochastic association between a voice feature quantity and a phonetic symbol into data. The dictionary 113 describes a plurality of text arrays as a candidate group of speech recognition results by speech recognition processing. The language model 112 is obtained by converting the appearance probability and the connection probability into data for each text array described in the dictionary 113.

なお、本実施の形態において、辞書１１３が記述するテキスト配列は、カタカナの単語をカタカナで表記した音節テキスト配列（以下「単語」という）であるものとする。 In the present embodiment, the text array described by the dictionary 113 is a syllable text array (hereinafter referred to as “word”) in which katakana words are written in katakana.

また、辞書１１３が記述する複数の単語（以下「登録単語群」という）の中には、前方に登録単語群の間で一意となる排他区間を有する単語（排他テキスト配列、以下「排他単語」という）が存在するものとする。また、登録単語群のうち、排他区間を有さない単語（以下「非排他単語」という）の中に、所定数以上の音節でいずれかの排他単語と前方一致となる単語（以下「前方一致単語」という）が存在するものとする。本実施の形態において、この所定数は、３とする。 Among a plurality of words (hereinafter referred to as “registered word group”) described by the dictionary 113, a word (exclusive text array, hereinafter referred to as “exclusive word”) having an exclusive section that is unique among the registered word groups ahead. )) Exists. In addition, among the registered words, a word that does not have an exclusive section (hereinafter referred to as a “non-exclusive word”) and that matches with one of the exclusive words in a predetermined number or more of syllables (hereinafter referred to as “forward match”). "Word"). In the present embodiment, the predetermined number is 3.

マーカ挿入部１２０は、辞書１１３に記述される単語のうち、排他単語に対して、その単語の排他区間の末尾位置に、マーカを挿入する。また、マーカ挿入部１２０は、辞書１１３に記述される単語のうち、非排他単語に対して、その単語の末尾位置に、マーカを挿入する。なお、このマーカ挿入処理は、音声認識処理が開始される前に行われる。 The marker insertion unit 120 inserts a marker at an end position of an exclusive section of an exclusive word among words described in the dictionary 113. In addition, the marker insertion unit 120 inserts a marker at the end position of a non-exclusive word among the words described in the dictionary 113. This marker insertion process is performed before the voice recognition process is started.

すなわち、辞書１１３は、初期状態では、マーカが挿入されていない単語を記述している。その後、辞書１１３は、マーカ挿入部１２０の機能により、マーカが挿入された排他単語を記述した状態となる。そして、挿入されたマーカは、排他単語の排他区間の末尾位置、あるいは、非排他単語の末尾位置を示す。 That is, the dictionary 113 describes a word with no marker inserted in the initial state. Thereafter, the dictionary 113 is in a state in which the exclusive word in which the marker is inserted is described by the function of the marker insertion unit 120. The inserted marker indicates the end position of the exclusive section of the exclusive word or the end position of the non-exclusive word.

音声入力部１３０は、周囲の音声を入力し、音声信号に変換して、音声認識処理部１４０へ出力する。周囲の音声にユーザの発話音声が含まれる場合、音声信号には、ユーザの発話音声の信号が含まれることになる。音声入力部１３０は、例えば、上記携帯電話機に供えられたマイクロフォンである。 The voice input unit 130 inputs ambient voice, converts it into a voice signal, and outputs the voice signal to the voice recognition processing unit 140. When the user's uttered voice is included in the surrounding voice, the voice signal includes the user's uttered voice signal. The voice input unit 130 is, for example, a microphone provided for the mobile phone.

音声認識処理部１４０は、音声入力部１３０から入力された音声信号に対して、音声認識処理を、所定の周期で繰り返し行う。この音声認識処理は、既に入力されている音声データ部分（音声部分）に対して音声認識を行い、１つまたは複数の音節から成るテキスト配列、あるいは、辞書１１３の登録単語を、音声データ（音声）に対する暫定の音声認識結果とする処理である。音声認識処理部１４０は、音響分析部１４１および認識デコーダ部１４２を有する。 The speech recognition processing unit 140 repeatedly performs speech recognition processing on the speech signal input from the speech input unit 130 at a predetermined cycle. In this speech recognition process, speech recognition is performed on a speech data portion (speech portion) that has already been input, and a text array composed of one or a plurality of syllables or a registered word in the dictionary 113 is converted into speech data (speech data). ) Is used as a provisional speech recognition result. The voice recognition processing unit 140 includes an acoustic analysis unit 141 and a recognition decoder unit 142.

音響分析部１４１は、音声信号を分析し、音声データに含まれる音声の特徴量（以下「音声特徴量」という）を抽出する。具体的には、音響分析部１４１は、音声信号に対してフレーム処理を行い、フレームごとにフーリエ解析を含む所定の処理を行って、ケプストラムパラメータ等を抽出する。そして、音響分析部１４１は、解析結果から、発話音声が含まれている音声区間を検出し、音声区間の音声特徴量のみによる時系列データを生成する。 The acoustic analysis unit 141 analyzes the audio signal, and extracts a feature amount of speech (hereinafter referred to as “speech feature amount”) included in the speech data. Specifically, the acoustic analysis unit 141 performs frame processing on the audio signal, performs predetermined processing including Fourier analysis for each frame, and extracts cepstrum parameters and the like. And the acoustic analysis part 141 detects the audio | voice area from which the speech sound is contained from an analysis result, and produces | generates the time series data only by the audio | voice feature-value of an audio | voice area.

認識デコーダ部１４２は、音響分析部１４１が生成した音声特徴量の時系列データに基づき、音声認識データベース１１０の音響モデル１１１、辞書１１３、および言語モデル１１２を参照して、暫定の音声認識結果を決定する。 The recognition decoder unit 142 refers to the acoustic model 111, the dictionary 113, and the language model 112 of the speech recognition database 110 based on the time series data of the speech feature amount generated by the acoustic analysis unit 141, and obtains a provisional speech recognition result. decide.

より具体的には、認識デコーダ部１４２は、音声特徴量の時系列データを、まず、１つまたは複数の音節から成るテキスト配列に変換する。そして、得られたテキスト配列が辞書１１３に存在しない場合、当該テキスト配列を暫定の音声認識結果として決定する。一方、得られたテキスト配列が辞書１１３に存在する場合、対応する単語（マーカが挿入された単語）を、暫定の音声認識結果として決定する。そして、認識デコーダ部１４２は、決定した暫定の音声認識結果（以下「暫定結果」という）を、表示部１５０および確定処理部１７０へ出力する。 More specifically, the recognition decoder unit 142 first converts the time-series data of speech feature values into a text array composed of one or more syllables. If the obtained text array does not exist in the dictionary 113, the text array is determined as a provisional speech recognition result. On the other hand, when the obtained text array exists in the dictionary 113, the corresponding word (word with the marker inserted) is determined as a provisional speech recognition result. Then, the recognition decoder unit 142 outputs the determined provisional speech recognition result (hereinafter referred to as “provisional result”) to the display unit 150 and the confirmation processing unit 170.

すなわち、音声認識処理部１４０からは、例えば、ユーザが上述のマイクロフォンに対して発話している間、音声テキスト入力の暫定結果が所定の周期で出力される。例えば、ユーザが音声テキスト入力を行おうとしている単語（以下「所望の単語」という）が比較的長い単語である場合、その単語の発話が完了する前に、数回の暫定結果が得られることになる。また、暫定結果としては、登録単語ではないテキスト配列、排他区間の末尾位置にマーカが挿入された排他単語、および単語の末尾位置にマーカが挿入された非排他単語、の３種類のうちのいずれかが得られることになる。 That is, from the speech recognition processing unit 140, for example, while the user is speaking to the above-described microphone, a provisional result of speech text input is output at a predetermined cycle. For example, if the word that the user is trying to input speech text (hereinafter “desired word”) is a relatively long word, several provisional results can be obtained before the utterance of the word is completed become. In addition, as a provisional result, any of the following three types is possible: a text array that is not a registered word, an exclusive word in which a marker is inserted at the end position of the exclusive section, and a non-exclusive word in which a marker is inserted at the end position of the word. Will be obtained.

なお、音声認識処理部１４０は、暫定結果毎に、その暫定結果の確からしさを示す確信度を算出する。そして、音声認識処理部１４０は、確定処理部１７０に対して出力する暫定結果に、算出した確信度を付加する。確信度の算出は、例えば、音声特徴量に対する発音記号の確率、および、発音記号に対する単語の出現確率や接続確率に基づき、最良パス（第１候補）の確率、および、その他のパス（第２候補以降）の確率差等の情報を利用して、行われる。 Note that the speech recognition processing unit 140 calculates a certainty factor indicating the certainty of the provisional result for each provisional result. Then, the voice recognition processing unit 140 adds the calculated certainty factor to the provisional result output to the confirmation processing unit 170. The certainty factor is calculated based on, for example, the probability of the phonetic symbol with respect to the voice feature amount, the probability of appearance of the word with respect to the phonetic symbol, and the connection probability, and the probability of the best path (first candidate) and other paths (second This is performed using information such as probability difference after the candidate).

表示部１５０は、音声認識処理部１４０から入力された暫定結果を表示する。また、表示部１５０は、新たな暫定結果が入力される毎に、新たな暫定結果により表示内容を更新する。表示部１５０は、例えば、上記携帯電話機に供えられた液晶ディスプレイである。 The display unit 150 displays the provisional result input from the voice recognition processing unit 140. In addition, every time a new provisional result is input, the display unit 150 updates the display content with the new provisional result. The display unit 150 is, for example, a liquid crystal display provided for the mobile phone.

操作入力部１６０は、表示部１５０に表示されている暫定結果に対する決定操作を受け付ける。そして、操作入力部１６０は、決定操作が行われた時、その旨を、確定処理部１７０へ通知する。操作入力部１６０は、例えば、上記携帯電話機に供えられたタッチパネルである。 The operation input unit 160 receives a determination operation for the provisional result displayed on the display unit 150. When the determination operation is performed, the operation input unit 160 notifies the confirmation processing unit 170 to that effect. The operation input unit 160 is, for example, a touch panel provided for the mobile phone.

すなわち、ユーザは、最新の暫定結果が所望の単語である場合、その暫定結果に対する決定操作を行うことができる。 That is, when the latest provisional result is a desired word, the user can perform a determination operation on the provisional result.

確定処理部１７０は、操作入力部１６０において上記決定操作が行われたとき、その決定操作が行われた暫定結果を、音声入力部１３０が入力する上記音声に対する確定の音声認識結果（以下「確定結果」という）とする。 When the determination operation is performed by the operation input unit 160, the confirmation processing unit 170 uses the provisional result of the determination operation as a final voice recognition result (hereinafter referred to as “confirmation”) for the voice input by the voice input unit 130. Results ”).

また、確定処理部１７０は、暫定結果が排他単語であるとき、操作入力部１６０において上記決定操作が行われていなくても、その排他単語を確定結果とする。すなわち、確定処理部１７０は、暫定結果が排他区間を有するテキスト配列であるとき、決定操作が行われる前に、その暫定結果を確定結果として決定する。 In addition, when the provisional result is an exclusive word, the confirmation processing unit 170 sets the exclusive word as the confirmation result even if the determination operation is not performed in the operation input unit 160. That is, when the provisional result is a text array having an exclusive section, the confirmation processing unit 170 determines the provisional result as the confirmation result before the determination operation is performed.

但し、本実施の形態において、確定処理部１７０は、直近に連続して得られた所定数の他の暫定結果の全てと前方一致となることを条件として、その排他単語を確定結果とする。本実施の形態において、この所定数は、２とする。すなわち、確定処理部１７０は、３回連続して前方一致となる暫定結果が得られたとき、その３回目の暫定結果を、確定結果とする。これは、音声の不明瞭さ等に起因して誤認識が発生し得ることを、考慮したものである。 However, in the present embodiment, the confirmation processing unit 170 sets the exclusive word as a confirmed result on the condition that all of the predetermined number of other provisional results obtained continuously in succession are forwardly matched. In the present embodiment, the predetermined number is 2. In other words, when a provisional result that is coincident with the front three times is obtained, the confirmation processing unit 170 sets the third provisional result as a confirmation result. This takes into account that misrecognition may occur due to the ambiguity of speech.

更に、確定処理部１７０は、暫定結果が非排他単語であるときも、直近に連続して得られた所定数の他の暫定結果の全てと前方一致となったとき、その３回目の暫定結果を、確定結果とする。これは、単語を発話し終えたときを考慮したものである。 Furthermore, even when the provisional result is a non-exclusive word, the confirmation processing unit 170 determines that the third provisional result is obtained when it coincides with all of the predetermined number of other provisional results obtained in succession. Is the final result. This takes into account when you have finished speaking a word.

すなわち、確定処理部１７０は、確からしい暫定結果として登録単語が得られたとき、決定操作を待たずに、その暫定結果を確定結果として決定する。そして、確定処理部１７０は、決定した確定結果を、確定結果使用部１８０へ出力する。 That is, when the registered word is obtained as a probable provisional result, the confirmation processing unit 170 determines the provisional result as the confirmation result without waiting for the determination operation. Then, the confirmation processing unit 170 outputs the determined confirmation result to the confirmation result use unit 180.

また、確定処理部１７０は、前方一致単語以外の単語、つまり、排他単語と前方一致とならない単語および２以下の音節でしか排他単語と前方一致とならない単語については、上記条件の判断の対象外とする。 The determination processing unit 170 excludes words other than the forward matching words, that is, words that do not match the exclusive words and words that do not match the exclusive words only in two or less syllables, from the determination of the above condition. And

暫定結果が排他単語であるか否かは、例えば、上述のマーカの有無により判断される。また、暫定結果が前方一致単語であるか否かは、例えば、上述のマーカの有無により判断される。 Whether or not the provisional result is an exclusive word is determined, for example, based on the presence or absence of the marker. Further, whether or not the provisional result is a forward matching word is determined, for example, based on the presence or absence of the marker.

確定結果使用部１８０は、入力された確定結果を使用して、所定の処理を行う。確定結果使用部１８０は、例えば、上記携帯電話機に搭載された電子メールアプリである。この場合、上記所定の処理は、例えば、入力された確定結果を、電子メールの宛先に入力しつつ、その入力内容を表示部１５０に表示させる処理である。 The confirmation result using unit 180 performs a predetermined process using the input confirmation result. The confirmation result using unit 180 is, for example, an e-mail application mounted on the mobile phone. In this case, the predetermined process is, for example, a process of displaying the input content on the display unit 150 while inputting the input confirmation result to an e-mail destination.

また、音声認識装置１００は、図示しないが、例えば、ＣＰＵ（central processing unit）、制御プログラムを格納したＲＯＭ（read only memory）等の記憶媒体、およびＲＡＭ（random access memory）等の作業用メモリ等を有する。この場合、上記した各部の機能は、ＣＰＵが制御プログラムを実行することにより実現される。 Although not shown, the speech recognition apparatus 100 is, for example, a CPU (central processing unit), a storage medium such as a ROM (read only memory) storing a control program, and a working memory such as a RAM (random access memory). Have In this case, the function of each unit described above is realized by the CPU executing the control program.

このような構成を有する音声認識装置１００は、確からしい暫定結果が得られた段階で、決定操作を待たずに、その暫定結果を確定結果とすることができる。 The speech recognition apparatus 100 having such a configuration can use the provisional result as a final result without waiting for a determination operation when a probable provisional result is obtained.

比較的長い単語は、排他区間を有することが多く、更に、排他区間の末尾から単語の末尾までの距離（発話に要する時間、音声認識処理に要する時間）が長いことが多い。また、所望の単語が排他単語である場合、排他区間の音声部分に対する音声認識処理が完了した時点で、所望の単語が暫定結果として決定されることが多い。 A relatively long word often has an exclusive section, and furthermore, the distance from the end of the exclusive section to the end of the word (time required for speech, time required for speech recognition processing) is often long. In addition, when the desired word is an exclusive word, the desired word is often determined as a provisional result at the time when the speech recognition processing for the speech portion in the exclusive section is completed.

したがって、音声認識装置１００は、所望の単語が比較的長い単語である場合、単語全体の音声に対する音声認識処理が完了する前に、正しい確定結果を得ることができる。すなわち、音声認識装置１００は、比較的長い単語に対しても、高精度かつ高速な音声テキスト入力を実現することができる。 Accordingly, when the desired word is a relatively long word, the speech recognition apparatus 100 can obtain a correct determination result before the speech recognition processing for the speech of the entire word is completed. That is, the speech recognition apparatus 100 can realize high-accuracy and high-speed speech text input even for relatively long words.

更に、上記構成を有する音声認識装置１００は、直近に連続して得られた２つの他の暫定結果の全てが当該排他単語についての前方一致単語ではない場合、当該排他単語を確定結果としないようにすることができる。 Furthermore, the speech recognition apparatus 100 having the above configuration does not make the exclusive word a final result when all of the two other provisional results obtained in succession are not forward matching words for the exclusive word. Can be.

単語全体の音声に対する音声認識処理が完了する前に確定結果を決定すると、発話の状態および入力音声の状態によっては、誤った確定結果を得るおそれがある。一方で、直近に連続して得られた２つの他の暫定結果の全てが最新の暫定結果についての前方一致単語である場合には、その最新の暫定結果が正しい可能性が非常に高い。 If the confirmation result is determined before the speech recognition process for the whole word speech is completed, an erroneous confirmation result may be obtained depending on the utterance state and the input speech state. On the other hand, if all of the two other provisional results obtained in succession most recently are forward matching words for the latest provisional result, the possibility that the latest provisional result is correct is very high.

したがって、音声認識装置１００は、誤認識が発生する可能性を、低減することができる。すなわち、音声認識装置１００は、更に認識精度を向上させた状態で、高速な音声テキスト入力を実現することができる。 Therefore, the speech recognition apparatus 100 can reduce the possibility of erroneous recognition. That is, the speech recognition apparatus 100 can realize high-speed speech text input with the recognition accuracy further improved.

また、音声認識装置１００は、非排他単語が暫定結果として得られた場合についても、直近に連続して得られた２つの他の暫定結果の全てと前方一致となる場合、その暫定結果を確定結果とすることができる。これにより、音声認識装置１００は、単語を言い終えて所定の時間（同一の暫定結果が３回得られるのに要する時間）が経過した段階で、決定操作を待たずに、その暫定結果を確定結果とすることができる。 In addition, when the non-exclusive word is obtained as a provisional result, the speech recognition apparatus 100 also determines the provisional result when it coincides with all of the other two other provisional results obtained in succession. Can be the result. As a result, the speech recognition apparatus 100 confirms the provisional result without waiting for the determination operation when a predetermined time (the time required for obtaining the same provisional result three times) has elapsed after the word is finished. Can be the result.

以上で、音声認識装置１００の構成についての説明を終了する。 Above, description about the structure of the speech recognition apparatus 100 is complete | finished.

次に、音声認識装置１００の動作について説明する。音声認識装置１００は、まず、初期設定動作としてマーカ挿入動作を行い、その後、音声認識動作を行う。 Next, the operation of the speech recognition apparatus 100 will be described. The speech recognition apparatus 100 first performs a marker insertion operation as an initial setting operation, and then performs a speech recognition operation.

マーカ挿入動作は、辞書１１３に格納された登録単語群に対して、マーカを挿入する動作である。音声認識動作は、マーカの挿入が行われた辞書１１３を用いて、高精度かつ高速な音声テキスト入力を実現する動作である。 The marker insertion operation is an operation for inserting a marker into a registered word group stored in the dictionary 113. The speech recognition operation is an operation for realizing high-accuracy and high-speed speech text input using the dictionary 113 in which the marker is inserted.

まず、マーカ挿入動作について説明する。 First, the marker insertion operation will be described.

図２は、音声認識装置１００のマーカ挿入動作の一例を示すフローチャートである。 FIG. 2 is a flowchart illustrating an example of the marker insertion operation of the speech recognition apparatus 100.

まず、ステップＳ１０１０において、マーカ挿入部１２０は、登録単語群の各単語を、音節で分割する。そして、マーカ挿入部１２０は、音節で分割された各単語を、５０音順でソートして、仮想の記憶領域である音節行列に格納する。 First, in step S1010, the marker insertion unit 120 divides each word of the registered word group by a syllable. And the marker insertion part 120 sorts each word divided | segmented by the syllable in the order of 50 syllables, and stores it in the syllable matrix which is a virtual storage area.

図３は、マーカ挿入動作前の登録単語群の一例を示す図である。 FIG. 3 is a diagram illustrating an example of a registered word group before the marker insertion operation.

図３に示すように、マーカ挿入動作前の登録単語群２１０は、複数の単語の集合である。各単語は、カタカナの音節テキスト配列である。 As shown in FIG. 3, the registered word group 210 before the marker insertion operation is a set of a plurality of words. Each word is a katakana syllable text array.

図４は、登録単語群が格納された音節行列の一例を示す図である。 FIG. 4 is a diagram illustrating an example of a syllable matrix in which registered word groups are stored.

図４に示すように、音節行列２２０は、登録単語群を、音節の位置を揃えた状態で、５０音順に並べたリストである。例えば、音節行列において、「イグザタブレット」という単語が格納された行の「タ」という音節が格納された列と、次の「イグザフォン」という単語が格納された行の「フォン」という音節が格納された列とは、同一の列に属している。 As shown in FIG. 4, the syllable matrix 220 is a list in which registered word groups are arranged in the order of 50 syllables with the positions of syllables aligned. For example, in the syllable matrix, the column containing the syllable “ta” in the row where the word “Igza Tablet” is stored, and the syllable “phone” in the row where the word “Igzaphone” is stored next are stored. The assigned columns belong to the same column.

そして、図２のステップＳ１０２０において、マーカ挿入部１２０は、音節行列から、行（単語）を１つ選択する。ここでは、マーカ挿入部１２０は、音節行列の上から下へと、順に選択していくものとする。例えば、マーカ挿入部１２０は、「アドバンス」という単語が格納された行を選択する。 In step S1020 of FIG. 2, the marker insertion unit 120 selects one row (word) from the syllable matrix. Here, it is assumed that the marker insertion unit 120 sequentially selects from the top to the bottom of the syllable matrix. For example, the marker insertion unit 120 selects a row in which the word “advance” is stored.

そして、ステップＳ１０３０において、マーカ挿入部１２０は、選択中の行を、次の行と列（音節）毎に比較し、少なくとも１つ以上の列で前方一致となっているか否かを判断する。前方一致となっている範囲とは、２つの配列の間で、それぞれの配列の前方を基準として一致する範囲のうち、最大の範囲をいうものとする。 In step S1030, the marker insertion unit 120 compares the selected row for each column (syllable) with the next row, and determines whether or not there is a forward match in at least one or more columns. The range corresponding to the front coincides with the maximum range among the ranges that coincide between the two sequences with reference to the front of each sequence.

マーカ挿入部１２０は、選択中の行と次の行との間で前方一致となっている場合（Ｓ１０３０：ＹＥＳ）、後述のステップＳ１０５０へ進む。また、マーカ挿入部１２０は、選択中の行と次の行との間で前方一致となっていない場合（Ｓ１０３０：ＮＯ）、ステップＳ１０４０へ進む。 If the marker insertion unit 120 has a forward match between the selected row and the next row (S1030: YES), the marker insertion unit 120 proceeds to step S1050 described later. In addition, when the marker insertion unit 120 does not match the selected line with the next line (S1030: NO), the process proceeds to step S1040.

例えば、「イグザフォントリプルエックス」という単語が選択されている場合、次の行の単語「ウニ」との間では、前方一致となっていない。したがって、このような場合、処理は、ステップＳ１０４０へ進む。 For example, when the word “Ixaphone Triple X” is selected, there is no forward match with the word “Uni” on the next line. Therefore, in such a case, the process proceeds to step S1040.

そして、ステップＳ１０４０において、マーカ挿入部１２０は、前回の比較でも前方一致していた場合（Ｓ１０４０：ＹＥＳ）、後述のステップＳ１０５０へ進む。また、マーカ挿入部１２０は、前回の比較で前方となっていない場合（Ｓ１０４０：ＮＯ）、後述のステップＳ１０６０へ進む。 In step S1040, the marker insertion unit 120 proceeds to step S1050, which will be described later, when the front comparison also matches in the previous comparison (S1040: YES). If the marker insertion unit 120 is not forward in the previous comparison (S1040: NO), the marker insertion unit 120 proceeds to step S1060 described later.

前回の比較でも前方一致していた場合とは、つまり、選択中の行の単語が、前後の行の両方の単語と前方一致となっている場合である。例えば、「イグザタブレット」という単語が格納された行が選択された場合、前の行の単語「イグザ」、および、次の行の単語「イグザフォン」との間では、「イ」、「グ」、「ザ」が格納された３つの列で前方一致となっている。したがって、この場合、処理は、ステップＳ１０５０へ進む。 The case where there is a forward match in the previous comparison means that the word in the selected row is a forward match with both words in the preceding and succeeding rows. For example, when a line storing the word “Igza Tablet” is selected, the word “Igza” in the previous line and the word “Igzaphone” in the next line are “I” and “Gu”. , “The” is stored in the three columns in which “the” is stored, and the front coincides. Therefore, in this case, the process proceeds to step S1050.

ステップＳ１０５０において、マーカ挿入部１２０は、選択中の行に格納された単語の次の行の単語との前方一致部分を、一致文字列リストに追加する。一致文字列リストは、前後２つの単語との間で前方一致となる単語を構成する文字列のうち、次の単語と前方一致となる部分（つまり、共通部分）をリスト化したものとなる。 In step S1050, the marker insertion unit 120 adds, to the matched character string list, a part that matches the word in the next line of the word stored in the selected line. The matched character string list is a list of portions (that is, common portions) that are forward-matched with the next word in a character string that constitutes a forward-matched word between the two words before and after.

図５は、一致文字列リストの一例を示す図である。 FIG. 5 is a diagram illustrating an example of the matching character string list.

図５に示すように、一致文字列リスト２３０は、前後２つの単語との間で前方一致となる単語を構成する文字列のうち、次の単語と前方一致となる部分のみを記述する。したがって、例えば、図３に示す登録単語群２１０のうち、「アドバンス」に対応する文字列は、一致文字列リスト２３０には記述されていない。 As shown in FIG. 5, the matched character string list 230 describes only a portion that matches the next word in the character string that forms a forward matched word between the two words before and after. Therefore, for example, in the registered word group 210 shown in FIG. 3, the character string corresponding to “advance” is not described in the matching character string list 230.

そして、ステップＳ１０６０において、マーカ挿入部１２０は、選択中の行が音節行列の最後の行（単語）であるか否かを判断する。 In step S1060, the marker insertion unit 120 determines whether the selected row is the last row (word) of the syllable matrix.

マーカ挿入部１２０は、選択中の行が音節行列の最後の行ではない場合（Ｓ１０６０：ＮＯ）、ステップＳ１０２０へ戻り、次の行に対する処理へ移る。また、マーカ挿入部１２０は、音節行列の全ての行について処理が完了すると（Ｓ１０６０：ＹＥＳ）、ステップＳ１０７０へ進む。 If the selected row is not the last row of the syllable matrix (S1060: NO), the marker insertion unit 120 returns to step S1020 and moves to the process for the next row. When the marker insertion unit 120 completes the process for all the rows of the syllable matrix (S1060: YES), the marker insertion unit 120 proceeds to step S1070.

そして、図２のステップＳ１０７０において、マーカ挿入部１２０は、再び、音節行列から、行（単語）を１つ選択する。ここでは、マーカ挿入部１２０は、音節行列の上から下へと、順に選択していくものとする。例えば、マーカ挿入部１２０は、「アドバンス」という単語が格納された行を選択する。 In step S1070 in FIG. 2, the marker insertion unit 120 again selects one row (word) from the syllable matrix. Here, it is assumed that the marker insertion unit 120 sequentially selects from the top to the bottom of the syllable matrix. For example, the marker insertion unit 120 selects a row in which the word “advance” is stored.

そして、ステップＳ１０８０において、マーカ挿入部１２０は、選択中の行の単語の前方部分が、一致文字列リスト（図５参照）に含まれているか否かを判断する。この判断は、つまり、単語が、排他単語および前方一致文字列のいずれかであるか否かの判断である。 In step S1080, the marker insertion unit 120 determines whether or not the forward portion of the word in the selected line is included in the matched character string list (see FIG. 5). In other words, this determination is a determination as to whether or not the word is one of an exclusive word and a front matching character string.

マーカ挿入部１２０は、選択中の行の単語が一致文字列リストに含まれている場合（Ｓ１０８０：ＹＥＳ）、後述のステップＳ１０９０へ進む。また、マーカ挿入部１２０は、選択中の行の単語が一致文字列リストに含まれていない場合（Ｓ１０８０：ＮＯ）、後述のステップＳ１１１０へ進む。 When the word on the selected line is included in the matched character string list (S1080: YES), the marker insertion unit 120 proceeds to step S1090 described later. In addition, when the word in the selected line is not included in the matched character string list (S1080: NO), the marker insertion unit 120 proceeds to step S1110 described later.

例えば、「アドバンス」という単語が選択されている場合、この単語の前方部分は、図５に示す一致文字列リスト２３０には含まれていない。したがって、この場合、処理は、ステップＳ１１１０へ進む。また、例えば、「イグザ」という単語が選択されている場合、この単語の前方部分は、図５に示す一致文字列リスト２３０に含まれている。したがって、この場合、処理は、ステップＳ１０９０へ進む。 For example, when the word “advance” is selected, the front part of this word is not included in the matching character string list 230 shown in FIG. Therefore, in this case, the process proceeds to step S1110. Further, for example, when the word “Igusa” is selected, the front part of this word is included in the matching character string list 230 shown in FIG. Therefore, in this case, the process proceeds to step S1090.

ステップＳ１０９０において、マーカ挿入部１２０は、選択中の行と次の行とを列（音節）毎に比較し、所定数（３）以上の列で前方一致となっているか否かを判断する。 In step S1090, the marker insertion unit 120 compares the selected row with the next row for each column (syllable), and determines whether or not there is a forward match in a predetermined number (3) or more columns.

マーカ挿入部１２０は、所定数（３）以上の列で前方一致となっている場合（Ｓ１０９０：ＹＥＳ）、ステップＳ１１００へ進む。また、マーカ挿入部１２０は、所定数（３）以上の列で前方一致となっていない場合（Ｓ１０９０：ＮＯ）、後述のステップＳ１１１０へ進む。 The marker insertion unit 120 proceeds to step S1100 when the number of columns equal to or greater than the predetermined number (3) is a forward match (S1090: YES). In addition, when the marker insertion unit 120 does not match the front in the predetermined number (3) or more (S1090: NO), the marker insertion unit 120 proceeds to step S1110 described later.

例えば、「イグザタブレット」という単語が格納された行が選択されている場合を想定する。この行は、「イグザ」の範囲で、次の行の「イグザフォン」という単語が格納された行と一致する。言い換えると、少なくとも、「イグザ」は、「イグザフォン」に包含されている。したがって、この場合、処理は、ステップＳ１１００へ進む。 For example, assume a case where a row in which the word “exa tablet” is stored is selected. This line corresponds to the line in which the word “exaphone” in the next line is stored in the range of “exa”. In other words, at least “Iguza” is included in “Iguzaphone”. Therefore, in this case, the process proceeds to step S1100.

また、例えば、「イカ」という単語が格納された行が選択されている場合を想定する。この行は、「イ」の範囲において、次の行の「イグザ」という単語が格納された行と一致するが、音素数が所定数（３）に満たない。このため、処理は、ステップＳ１１１０へ進む。 Further, for example, a case is assumed where a row in which the word “squid” is stored is selected. This line coincides with the line in which the word “Igza” in the next line is stored in the range of “i”, but the number of phonemes is less than the predetermined number (3). Therefore, the process proceeds to step S1110.

ステップＳ１１００において、マーカ挿入部１２０は、辞書１１３に記述された登録単語群のうち、選択中の単語に対応するものに対して、マーカを挿入して、ステップＳ１１２０へ進む。より具体的には、マーカ挿入部１２０は、選択中の行が次の行と一致しない列（音節）のうち、最も前方に位置する列の直前の位置に、マーカを挿入する。 In step S1100, the marker insertion unit 120 inserts a marker into the registered word group described in the dictionary 113 corresponding to the selected word, and the process proceeds to step S1120. More specifically, the marker insertion unit 120 inserts a marker at a position immediately before the foremost column among the columns (syllables) in which the selected row does not match the next row.

一方、ステップＳ１１１０において、マーカ挿入部１２０は、辞書１１３に記述された登録単語群のうち、選択中の単語に対応するものに対して、マーカを挿入して、ステップＳ１１２０へ進む。より具体的には、マーカ挿入部１２０は、単語の末尾位置に、マーカを挿入する。 On the other hand, in step S1110, the marker insertion unit 120 inserts a marker into the registered word group described in the dictionary 113 corresponding to the currently selected word, and proceeds to step S1120. More specifically, the marker insertion unit 120 inserts a marker at the end position of the word.

ステップＳ１１２０において、マーカ挿入部１２０は、選択中の行が音節行列の最後の行（単語）であるか否かを判断する。 In step S1120, the marker insertion unit 120 determines whether the currently selected row is the last row (word) of the syllable matrix.

マーカ挿入部１２０は、選択中の行が音節行列の最後の行ではない場合（Ｓ１１２０：ＮＯ）、ステップＳ１０７０へ戻り、次の行に対する処理へ移る。そして、音声認識装置１００は、音節行列の全ての行について処理が完了すると（Ｓ１１２０：ＹＥＳ）、マーカ挿入動作を終了する。 If the selected row is not the last row of the syllable matrix (S1120: NO), the marker insertion unit 120 returns to step S1070 and moves to the process for the next row. And the speech recognition apparatus 100 will complete | finish a marker insertion operation, if a process is completed about all the lines of a syllable matrix (S1120: YES).

このようなマーカ挿入動作により、音声認識装置１００は、辞書１１３に記述された登録単語群のうち、排他単語を、排他区間の末尾位置にマーカが挿入された状態のものにし、非排他単語を、単語の末尾位置にマーカが挿入された状態のものにすることができる。すなわち、登録単語については全てマーカが挿入された状態となる。 By such marker insertion operation, the speech recognition apparatus 100 changes the exclusive word from the registered word group described in the dictionary 113 to a state in which the marker is inserted at the end position of the exclusive section, and sets the non-exclusive word. , And a marker inserted at the end of the word. That is, all the registered words are in a state where a marker is inserted.

なお、音声認識装置１００は、マーカとして、例えば、「＠」＋読みその他の付加情報＋「＠」という文字列を用いる。この付加情報は、例えば、確定結果使用部１８０で用いられる。 Note that the speech recognition apparatus 100 uses, for example, a character string “@” + reading other additional information + “@” as a marker. This additional information is used by, for example, the determination result use unit 180.

図６は、マーカ挿入動作後の登録単語群の一例を示す図である。 FIG. 6 is a diagram illustrating an example of a registered word group after the marker insertion operation.

図６に示すように、マーカ挿入動作後の登録単語群２４０は、マーカが挿入された状態となる。例えば、「イグザタブレット」という単語には、「タ」と「ブ」との間に、「＠ｉｘａｔａｂｌｅｔ＠」というマーカが挿入されている。 As shown in FIG. 6, the registered word group 240 after the marker insertion operation is in a state where a marker is inserted. For example, a marker “@ ixablelet @” is inserted between “ta” and “bu” in the word “exa tablet”.

以上で、マーカ挿入動作についての説明を終える。 This is the end of the description of the marker insertion operation.

次に、音声認識動作について説明する。 Next, the voice recognition operation will be described.

図７は、音声認識動作の一例を示すフローチャートである。ここでは、音声入力部１３０は、継続的に、音声入力を行い、音声データを音声認識処理部１４０へ出力しているものとする。また、音声認識装置１００は、上述の音声区間が開始される毎に、および、音声区間中に確定結果が得られる毎に、以下に説明する音声認識動作を行うものとする。 FIG. 7 is a flowchart showing an example of the voice recognition operation. Here, it is assumed that the voice input unit 130 continuously performs voice input and outputs voice data to the voice recognition processing unit 140. In addition, the speech recognition apparatus 100 performs a speech recognition operation described below every time the above-described speech section is started and every time a determination result is obtained during the speech section.

まず、ステップＳ２０１０において、音声認識処理部１４０は、今回の音声認識動作が開始されてから既に入力されている音声データの部分（以下「既入力データ部分」という）に対して、上述の音声認識処理を行う。すなわち、音声認識処理部１４０は、音声認識の暫定結果を決定し、その過程において、当該暫定結果の確信度を算出する。そして、音声認識処理部１４０は、決定した暫定結果を表示部１５０へ出力し、表示部１５０に表示させる。 First, in step S2010, the speech recognition processing unit 140 performs the above-described speech recognition on a portion of speech data that has already been input since the start of the current speech recognition operation (hereinafter referred to as “already input data portion”). Process. That is, the speech recognition processing unit 140 determines a provisional result of speech recognition, and calculates the certainty factor of the provisional result in the process. Then, the voice recognition processing unit 140 outputs the determined provisional result to the display unit 150 and causes the display unit 150 to display it.

なお、表示部１５０には、マーカが除去された状態で、単語が表示されるものとする。マーカの除去は、例えば、表示部１５０が行う。また、音声認識処理部１４０は、決定した暫定結果を、その確信度と併せて、確定処理部１７０へ出力する。 It is assumed that the word is displayed on display unit 150 with the marker removed. The marker 150 is removed by the display unit 150, for example. In addition, the voice recognition processing unit 140 outputs the determined provisional result to the confirmation processing unit 170 together with the certainty factor.

音声認識処理部１４０は、後述のステップＳ２１００からの処理の戻りにより、ステップＳ２０１０の処理を、所定の周期（例えば、１００ｍｓｅｃ）で行う。したがって、発話が行われている間、暫定結果は、繰り返し得られる。また、確定結果が得られていない既入力データ部分は、時間の経過と共に長くなる。したがって、得られる暫定結果は、時間の経過と共に変化し得る。 The speech recognition processing unit 140 performs the process of step S2010 at a predetermined cycle (for example, 100 msec) by returning the process from step S2100 described later. Therefore, the provisional result is obtained repeatedly while the utterance is performed. In addition, the already-input data portion for which the final result has not been obtained becomes longer as time elapses. Thus, the provisional results obtained can change over time.

そして、ステップＳ２０２０において、確定処理部１７０は、入力された確信度が第１の所定値以上であるか否かを判断する。この所定値は、例えば、音声テキスト入力に求められる精度および速度に基づいて、実験等により決定された値である。 In step S2020, the confirmation processing unit 170 determines whether the input certainty factor is equal to or greater than a first predetermined value. This predetermined value is a value determined by experiments or the like based on the accuracy and speed required for speech text input, for example.

確定処理部１７０は、確信度が所定値以上である場合（Ｓ２０２０：ＹＥＳ）、ステップＳ２０３０へ進む。また、確定処理部１７０は、確信度が所定値未満である場合（Ｓ２０２０：ＮＯ）、後述のステップＳ２０５０へ進む。 If the certainty factor is greater than or equal to the predetermined value (S2020: YES), the confirmation processing unit 170 proceeds to step S2030. Further, when the certainty factor is less than the predetermined value (S2020: NO), the confirmation processing unit 170 proceeds to step S2050 described later.

ステップＳ２０３０において、確定処理部１７０は、暫定結果が、マーカを含むものであるか否かを判断する。すなわち、確定処理部１７０は、暫定結果が、登録単語であるか否かを判断する。 In step S2030, the confirmation processing unit 170 determines whether the provisional result includes a marker. That is, the confirmation processing unit 170 determines whether or not the provisional result is a registered word.

確定処理部１７０は、暫定結果がマーカを含むものである場合（Ｓ２０３０：ＹＥＳ）、後述のステップＳ２０４０へ進む。また、確定処理部１７０は、暫定結果がマーカを含まないものである場合（Ｓ２０３０：ＮＯ）、ステップＳ２０５０へ進む。 If the provisional result includes a marker (S2030: YES), the confirmation processing unit 170 proceeds to step S2040 described later. If the provisional result does not include a marker (S2030: NO), the confirmation processing unit 170 proceeds to step S2050.

例えば、入力音声の品質が低い場合や、暫定結果が「イ」である場合、処理は、後述のステップＳ２０５０へ進む。また、例えば、入力音声の品質が高く場合や、暫定結果が「イグザ＠ｉｘａ＠」や「イグザタ＠ｉｘａｔａｂｌｅｔ＠ブレット」である場合、処理はステップＳ２０４０へ進む。 For example, when the quality of the input voice is low or the provisional result is “I”, the process proceeds to step S2050 described later. Further, for example, when the quality of the input voice is high, or when the provisional result is “Ixa @ ixa @” or “Ixata @ ixatable @ Brett”, the process proceeds to Step S2040.

ステップＳ２０４０において、確定処理部１７０は、今回の暫定結果が、前回記録と前方一致となっているか否かを判断する。ここで、前回記録とは、過去に後述のステップＳ２０９０において記録され、その記録が維持されている、過去の暫定結果のうち、直近のものである。なお、確定処理部１７０は、音声区間の初回の暫定結果については、ステップＳ２０２０〜Ｓ２０８０を実行せず、ステップＳ２０９０に進んでもよい。 In step S2040, the confirmation processing unit 170 determines whether or not the current temporary result matches the previous recording. Here, the previous record is the latest one of the past provisional results that have been recorded in the past in step S2090 and maintained. Note that the confirmation processing unit 170 may proceed to step S2090 without executing steps S2020 to S2080 for the initial provisional result of the speech section.

確定処理部１７０は、今回の暫定結果が前回記録と前方一致となっている場合（Ｓ２０４０：ＹＥＳ）、ステップＳ２０６０へ進む。また、確定処理部１７０は、今回の暫定結果が前回記録と前方一致となっていない場合（Ｓ２０４０：ＮＯ）、後述のステップＳ２０７０へ進む。 The confirmation processing unit 170 proceeds to step S2060 when the provisional result of this time coincides with the previous recording (S2040: YES). If the current provisional result does not coincide with the previous record (S2040: NO), the confirmation processing unit 170 proceeds to step S2070 described later.

例えば、前回記録が「イグザ＠ｉｘａ＠」であり、今回の暫定結果が「イグザタ＠ｔａｂｌｅｔ＠ブレット」である場合、処理は、ステップＳ２０６０へ進む。 For example, if the previous record is “Igusa @ ixa @” and the current provisional result is “Exact @ tablet @ Brett”, the process proceeds to step S2060.

ステップＳ２０６０において、確定処理部１７０は、前方一致パラメータをインクリメントする。前方一致パラメータは、所定数（３）以上の音節で前方一致となる暫定結果が連続して得られた回数（以下「連続前方一致回数」という）を示すパラメータであり、初期値は０である。 In step S2060, the confirmation processing unit 170 increments the front matching parameter. The forward coincidence parameter is a parameter indicating the number of times the provisional result that is forward coincidence is obtained continuously with a predetermined number (3) or more (hereinafter referred to as “continuous forward coincidence number”), and the initial value is 0 .

例えば、前々回の暫定結果が「イカ＠烏賊＠」であり、前回の暫定結果が「イグザ＠ｉｘａ＠」であり、今回の暫定結果が「イグザタ＠ｔａｂｌｅｔ＠ブレット」である場合を想定する。この場合、前方一致パラメータは、「１」となる。 For example, a case is assumed in which the provisional result of the previous round is “squid @ bandit @”, the previous provisional result is “igza @ ixa @”, and the current provisional result is “exact @ tablet @ brett”. In this case, the forward match parameter is “1”.

そして、ステップＳ２０８０において、確定処理部１７０は、前方一致パラメータが所定値（２）に到達したか否かを判断する。すなわち、確定処理部１７０は、最新の暫定結果が、直近に連続して得られた２以上の他の暫定結果の全てと前方一致となっているという条件（以下「カウント条件」という）が満たされるか否かを判断する。 In step S2080, confirmation processing section 170 determines whether or not the forward matching parameter has reached a predetermined value (2). In other words, the confirmation processing unit 170 satisfies the condition that the latest provisional result is forwardly coincident with all of two or more other provisional results obtained consecutively most recently (hereinafter referred to as “count condition”). To determine whether or not

確定処理部１７０は、前方一致パラメータが所定値（２）に到達した場合（Ｓ２０８０：ＹＥＳ）、後述のステップＳ２１１０へ進む。また、確定処理部１７０は、前方一致パラメータが所定値（２）に到達していない場合（Ｓ２０８０：ＮＯ）、ステップＳ２０９０へ進む。 If the forward matching parameter reaches the predetermined value (2) (S2080: YES), the confirmation processing unit 170 proceeds to step S2110 described later. If the forward matching parameter has not reached the predetermined value (2) (S2080: NO), the confirmation processing unit 170 proceeds to step S2090.

ステップＳ２０９０において、確定処理部１７０は、今回の暫定結果を記録する。 In step S2090, the confirmation processing unit 170 records the current provisional result.

一方、ステップＳ２０５０において、確定処理部１７０は、前回記録が存在する場合には、これをクリアして、後述のステップＳ２１００へ進む。すなわち、確定処理部１７０は、暫定結果の確信度が低い場合や、暫定結果が登録単語ではない場合、前回と今回の暫定結果を、上述のカウント条件の判断の対象外とする。 On the other hand, in step S2050, if there is a previous record, the confirmation processing unit 170 clears this and proceeds to step S2100 described later. That is, when the certainty factor of the provisional result is low or the provisional result is not a registered word, the confirmation processing unit 170 excludes the previous and current provisional results from the above-described count condition determination.

また、ステップＳ２０７０において、確定処理部１７０は、前方一致パラメータを初期化して、ステップＳ２１００へ進む。すなわち、確定処理部１７０は、今回の暫定結果が前回記録と前方一致となっていない場合、連続前方一致回数のカウントの起点を、次回の暫定結果にシフトさせる。 In step S2070, the confirmation processing unit 170 initializes the forward matching parameter and proceeds to step S2100. That is, if the current temporary result does not coincide with the previous record, the confirmation processing unit 170 shifts the starting point for counting the number of consecutive forward matches to the next temporary result.

そして、ステップＳ２１００において、確定処理部１７０は、今回の暫定結果に対して、決定操作が行われたか否かを判断する。すなわち、確定処理部１７０は、操作入力部１６０から、決定操作が行われた旨が通知されたか否かを判断する。 In step S2100, the confirmation processing unit 170 determines whether a determination operation has been performed on the current provisional result. That is, the confirmation processing unit 170 determines whether or not the operation input unit 160 has notified that the determination operation has been performed.

確定処理部１７０は、決定操作が行われていない場合（Ｓ２１００：ＮＯ）、ステップＳ２０１０へ戻り、新たに入力されたデータ部分を含む既入力データ部分に対する処理へ移る。また、確定処理部１７０は、決定操作が行われた場合（Ｓ２１００：ＹＥＳ）、ステップＳ２１１０へ進む。 If the determination operation is not performed (S2100: NO), the confirmation processing unit 170 returns to step S2010, and proceeds to processing for the already input data portion including the newly input data portion. In addition, when the determination operation is performed (S2100: YES), the confirmation processing unit 170 proceeds to step S2110.

例えば、連続前方一致回数が２回に到達した場合、決定操作が行われていなくても、処理はステップＳ２１１０へ進む。これは、排他単語の排他区間の認識が完了した可能性、あるいは、単語全体の認識が完了した可能性が、十分に高いことを示す。 For example, when the number of consecutive forward matches has reached 2, even if the determination operation is not performed, the process proceeds to step S2110. This indicates that the possibility that the exclusive section of the exclusive word has been recognized or that the recognition of the entire word has been completed is sufficiently high.

また、決定操作が行われた場合、連続前方一致回数が２回に到達していなくても、処理はステップＳ２１１０へ進む。 If a determination operation is performed, the process proceeds to step S2110 even if the number of consecutive forward matches has not reached 2.

ステップＳ２１１０において、確定処理部１７０は、今回の暫定結果を、確定結果に決定し、確定結果を確定結果使用部１８０へ出力する。そして、音声認識装置１００は、音声認識動作を終了する。 In step S <b> 2110, the confirmation processing unit 170 determines the current provisional result as a confirmation result, and outputs the confirmation result to the confirmation result use unit 180. Then, the voice recognition device 100 ends the voice recognition operation.

なお、確定結果使用部１８０は、マーカが除去された状態の確定結果を使用してもよいし、マーカが含まれる状態の確定結果を使用してもよい。マーカの除去は、確定処理部１７０が行ってもよいし、確定結果使用部１８０が行ってもよい。 The confirmation result using unit 180 may use a confirmation result in a state where the marker is removed, or may use a confirmation result in a state where the marker is included. The removal of the marker may be performed by the confirmation processing unit 170 or the confirmation result using unit 180.

このような音声認識動作により、音声認識装置１００は、マーカが挿入された単語を登録した辞書１１３を用いて、発話中に、暫定結果を周期的に得ることができる。また、音声認識装置１００は、排他単語の排他区間の発話が終了した可能性、あるいは、単語全体の発話が終了した可能性が、十分に高いとき、決定操作が行われていなくても、その時点で得られた暫定単語を、確定結果とすることができる。 With such a speech recognition operation, the speech recognition apparatus 100 can periodically obtain a provisional result during utterance using the dictionary 113 in which a word with a marker inserted is registered. In addition, the speech recognition apparatus 100 can determine whether the utterance of the exclusive section of the exclusive word or the utterance of the entire word is sufficiently high even if the determination operation is not performed. The provisional word obtained at the time can be used as the final result.

以上で、音声認識動作についての説明を終える。 This is the end of the description of the voice recognition operation.

次に、発話内容および暫定結果の具体例を挙げて、確定結果が得られるまでの動作の様子および本発明の効果について説明する。 Next, a specific example of the utterance content and the provisional result will be described, and the state of operation until the final result is obtained and the effect of the present invention will be described.

図８は、比較的長い単語の音声が入力された場合の、発話が開始されてから確定結果が得られるまでに要する時間の一例を説明する図である。 FIG. 8 is a diagram for explaining an example of the time required from the start of utterance to the determination result being obtained when a relatively long word voice is input.

図８の上側に示すように、「イグザフォントリプルエックス」という単語が発話されたとする。そして、その発話開始時刻は、時刻ｔ０であり、発話終了時刻ｔ２であるとする。すなわち、「イグザフォントリプルエックス」という単語は、発話に、時間ｔ２を要するものとする。また、音声の入力が開始されてから対応部分の暫定結果が得られるまでには、音声データのバッファリング等の処理により、時間ｔ１を要するものとする。 As shown in the upper side of FIG. 8, it is assumed that the word “exaphone triple X” is uttered. Then, it is assumed that the utterance start time is time t0 and utterance end time t2. That is, it is assumed that the word “exaphone triple X” requires time t2 to speak. Also, it is assumed that time t1 is required from the start of voice input until the provisional result of the corresponding portion is obtained, due to processing such as buffering of voice data.

「イグザフォントリプルエックス」という単語の発話音声の音声データ全体に基づいて確定結果を得る場合、当該確定結果の精度は高い。ところが、この場合、確定結果が得られるのは、時刻ｔ０から時間ｔ１＋ｔ２が経過した時刻ｔ１＋ｔ２となる。すなわち、従来では、発話者は、時刻ｔ１＋ｔ２の後まで、暫定結果が表示されるのを待ってから、確定操作を行う。 When the final result is obtained based on the entire voice data of the speech voice of the word “Ixaphone Triple X”, the accuracy of the final result is high. However, in this case, the final result is obtained at time t1 + t2 when time t1 + t2 has elapsed from time t0. That is, conventionally, the speaker waits for the provisional result to be displayed until after time t1 + t2, and then performs the confirmation operation.

ここで、「イグザフォントリプルエックス」は、図６に示すように、「イグザフォント＠ｉｘａｐｈｏｎｅｔｒｉｐｌｅＸ＠リプルエックス」という排他単語であったとする。この場合、図８の下側に示すように、音声認識装置１００は、「ト」の位置までの既入力データ部分に対する暫定結果が得られた時刻ｔｅの段階で、暫定結果を確定結果とすることができる。「イグザフォント」という排他区間は、他のいずれの単語とも一致しない。したがって、音声認識装置１００は、時刻ｔ１＋ｔ２よりも早い時刻である時刻ｔｅに、比較的高い精度で、音声認識結果を確定させることができる。 Here, it is assumed that “exaphone triple X” is an exclusive word “exa font @ixaphonetriple X @ripple X” as shown in FIG. In this case, as shown in the lower side of FIG. 8, the speech recognition apparatus 100 sets the provisional result as the final result at the stage of time te when the provisional result for the already input data portion up to the position “g” is obtained. be able to. The exclusive section “exa font” does not match any other word. Therefore, the speech recognition apparatus 100 can determine the speech recognition result with relatively high accuracy at time te, which is a time earlier than time t1 + t2.

但し、音声データの一部のみを用いる場合の音声認識精度は、音声データ全体を用いる場合の音声認識精度に比べて低い。そこで、上述の通り、音声認識装置１００は、暫定結果が排他単語であり、かつ、直近に連続して得られた所定数の他の暫定結果の全てと前方一致となることを、暫定結果を確定させるための条件とする。 However, the speech recognition accuracy when only a part of the speech data is used is lower than the speech recognition accuracy when the entire speech data is used. Therefore, as described above, the speech recognition apparatus 100 determines that the provisional result is an exclusive word, and that the provisional result is a forward match with all of the predetermined number of other provisional results obtained in succession. This is a condition for finalization.

図９および図１０は、各時刻で得られる暫定結果一例および結果確定の様子の一例を示す図である。図９は、入力音声の品質が良好である場合の例であり、図１０は、入力音声の品質が良好ではない場合の例である。いずれの場合も、所望の単語は、「イグザフォントリプルエックス」であるものとする。 FIG. 9 and FIG. 10 are diagrams illustrating an example of a provisional result obtained at each time and an example of how the result is determined. FIG. 9 is an example when the quality of the input voice is good, and FIG. 10 is an example when the quality of the input voice is not good. In any case, it is assumed that the desired word is “Ixaphone Triple X”.

図９に示すように、時刻ｔ１１〜ｔ１４において、それぞれ、「イカ＠烏賊＠」、「イグザ＠ｉｘａ＠」「イグザフォン＠ｉｘａｐｈｏｎｅ＠」、および「イグザフォント＠ｉｘａｐｈｏｎｅｔｒｉｐｌｅＸ＠リプルエックス」という暫定結果が得られたものとする。入力音声の品質が良好である場合、このように、所望の単語と前方一致となる単語が暫定結果となることが多い。 As shown in FIG. 9, at time t11 to t14, provisional results of “squid @ bandit @”, “exa @ ixa @”, “exaphone @ ixaphone @”, and “exafont @ ixaphonetripleX @ ripple X” are obtained, respectively. Shall be. When the quality of the input speech is good, a word that matches the desired word in front is often a provisional result.

「イグザ＠ｉｘａ＠」「イグザフォン＠ｉｘａｐｈｏｎｅ＠」、および「イグザフォント＠ｉｘａｐｈｏｎｅｔｒｉｐｌｅＸ＠リプルエックス」では、図９に下線で示すように、「イグザ」の部分が一致する。また、これらの暫定結果は連続しており、「イグザフォント＠ｉｘａｐｈｏｎｅｔｒｉｐｌｅＸ＠リプルエックス」は排他単語である。 In “Igusa @ ixa @”, “Ixaphone @ ixaphone @”, and “Ixafont @ ixaphonetripleX @ RippleX”, as indicated by the underline in FIG. In addition, these provisional results are continuous, and “exafont @ ixaphonetripleX @ ripple X” is an exclusive word.

したがって、時刻ｔ１４に「イグザフォント＠ｉｘａｐｈｏｎｅｔｒｉｐｌｅ＠リプルエックス」が得られた時点で、音声認識結果は確定される。これは、「イグザフォントリプルエックス」という単語の発話音声の音声データ全体に基づいて確定結果が得られる時刻ｔｅよりも早い時刻である。 Therefore, the voice recognition result is determined when “exafont @ ixaphoneetriple @ ripplex” is obtained at time t14. This is a time earlier than the time te when the final result is obtained based on the entire voice data of the speech voice of the word “Ixaphone Triple X”.

また、図１０に示すように、時刻ｔ１１〜ｔ１６において、それぞれ、「イカ＠烏賊＠」、「イグザ＠ｉｘａ＠」、「イ」、「イカ＠烏賊＠」、「イグザフォンス＠ｉｘａｐｈｏｎｅｓｐｏｒｔｓ＠ポーツ」、「イグザフォント＠ｉｘａｐｈｏｎｅｔｒｉｐｌｅＸ＠リプルエックス」という暫定結果が得られたものとする。また、時刻ｔ１４の「イカ＠烏賊＠」という暫定結果の確信度は低いものとする。入力音声の品質が良好ではない場合、このように、所望の単語と前方一致とならない単語が暫定結果となったり、確信度が低くなったりすることが多い。 Also, as shown in FIG. 10, at time t11 to t16, “squid @ bandit @”, “igza @ ixa @”, “i”, “squid @ bandit @”, “exaphones @ ixaphonesports @ ports”, respectively. Suppose that a provisional result “exafont @ ixaphonetrixX @ ripple X” is obtained. Further, it is assumed that the certainty of the provisional result “squid @ bandit @” at time t14 is low. When the quality of the input speech is not good, a word that does not coincide with the desired word is often a provisional result or the certainty level is low.

「イグザ＠ｉｘａ＠」は、「イグザフォント＠ｉｘａｐｈｏｎｅｔｒｉｐｌｅＸ＠リプルエックス」の前方一致単語であるが、「イ」は、「イグザフォント＠ｉｘａｐｈｏｎｅｔｒｉｐｌｅＸ＠リプルエックス」の前方一致単語ではない。ところが、「イ」の記録は、次の「イカ＠烏賊＠」の確信度が低いために、クリアされる。また、「イカ＠烏賊＠」は、記録されない。したがって、これらは、上記条件の対象外となる。 “Igza @ ixa @” is a forward matching word of “Ixafont @ ixaphonetripleX @ Ripple X”, but “I” is not a forward matching word of “Igzafont @ ixaphonetripleX @ RippleX”. However, the record of “I” is cleared because the certainty of the next “squid @ bandit @” is low. Also, “squid @ bandit @” is not recorded. Therefore, these are not subject to the above conditions.

したがって、「イグザフォンス＠ｉｘａｐｈｏｎｅｓｐｏｒｔｓ＠ポーツ」に続けて、時刻ｔ１６に「イグザフォント＠ｉｘａｐｈｏｎｅｔｒｉｐｌｅＸ＠リプルエックス」が暫定結果として得られた時点で、音声認識結果は確定される。これは、「イグザフォントリプルエックス」という単語の発話音声の音声データ全体に基づいて確定結果が得られる時刻ｔｅよりも早い時刻である。 Therefore, after “Ixaphones @ ixaphonesports @ Ports”, the voice recognition result is determined when “Ixafont @ ixaphonetripleX @ RippleX” is obtained as a provisional result at time t16. This is a time earlier than the time te when the final result is obtained based on the entire voice data of the speech voice of the word “Ixaphone Triple X”.

すなわち、音声認識装置１００は、発話者が所望の単語の全てを発話し終える前に、音声認識結果を１つに絞り込めたとき、当該音声認識結果で確定を行うことができる。すなわち、音声認識装置１００は、高速レスポンスの音声認識を実現することができる。 That is, the speech recognition apparatus 100 can confirm the speech recognition result when the speech recognition result is narrowed down to one before the speaker finishes speaking all the desired words. That is, the speech recognition apparatus 100 can realize high-speed response speech recognition.

但し、入力音声の品質が劣悪であるような場合には、暫定結果の揺らぎが大きくなり、時刻ｔｅになっても上記条件が満たされないことがある。このような場合には、例えば、ユーザの決定操作、あるいは、修正操作により、音声認識結果が確定されることになる。 However, when the quality of the input voice is poor, the provisional result fluctuates greatly, and the above condition may not be satisfied even at time te. In such a case, for example, the speech recognition result is determined by the user's determination operation or correction operation.

このように、音声認識装置１００は、暫定結果の揺らぎが小さい場合には、早期の結果確定を行いつつ、暫定結果の揺らぎが大きい場合には、暫定認識結果の信頼度が十分に高くなってから、結果確定を行う。したがって、音声認識装置１００は、特に比較的長い単語について、音声テキスト入力の高速化と高精度化とを、バランス良く実現することができる。 As described above, when the fluctuation of the provisional result is small, the speech recognition apparatus 100 performs early determination of the result, and when the fluctuation of the provisional result is large, the reliability of the provisional recognition result is sufficiently high. Then, confirm the result. Therefore, the speech recognition apparatus 100 can achieve high speed and high accuracy of speech text input with a good balance, especially for relatively long words.

以上、具体例による動作および効果についての説明を終える。 This is the end of the description of the operations and effects of the specific example.

以上説明したように、本実施の形態の音声認識装置１００は、暫定結果が排他単語であり、かつ、直近に連続して得られた所定数の他の暫定結果の全てと前方一致となることを条件として、その排他単語を確定結果とする。これにより、音声認識装置１００は、比較的長い語句に対しても、高精度かつ高速な音声テキスト入力を実現することができる。 As described above, in the speech recognition apparatus 100 according to the present embodiment, the provisional result is an exclusive word, and the predetermined number of other provisional results obtained in succession most recently coincide with each other. As a condition, the exclusive word is taken as the final result. Thereby, the speech recognition apparatus 100 can realize high-accuracy and high-speed speech text input even for relatively long words.

なお、以上説明した実施の形態では、マーカを用いて登録単語を判定するようにしたが、これらの単語の判定手法は、これに限定されない。例えば、音声認識装置は、単語のグルーピングや、他の付加情報を用いて、登録単語の判別を行うようにしてもよい。 In the embodiment described above, the registered word is determined using the marker, but the determination method of these words is not limited to this. For example, the speech recognition apparatus may determine a registered word using word grouping or other additional information.

また、図１に示す各機能部は、必ずしも、１つの装置に一体的に設けられている必要はない。例えば、音声認識データベースおよびマーカ挿入部を、インターネット上のサーバに配置し、その他の機能部を、携帯電話機に配置してもよい。 In addition, each functional unit illustrated in FIG. 1 is not necessarily provided integrally in one apparatus. For example, the voice recognition database and the marker insertion unit may be arranged on a server on the Internet, and the other functional units may be arranged on a mobile phone.

また、排他単語との前方一致判断の閾値（所定値）、および、連続前方一致回数の判断の閾値（所定値）は、上述の例に限定されない。これらの値は、例えば、音声テキスト入力に求められる精度および速度に基づいて、実験等により決定される。 Further, the threshold value (predetermined value) for determining the forward match with the exclusive word and the threshold value (predetermined value) for determining the number of consecutive forward matches are not limited to the above example. These values are determined by experiments or the like based on the accuracy and speed required for speech text input, for example.

また、音声認識結果の候補として辞書に記述されるテキスト配列の形式および内容は、上述の例に限定されない。例えば、テキスト配列は、仮名漢字文字列であってもよい。また、テキスト配列は、単語および文章（語句）であってもよい。 Further, the format and contents of the text array described in the dictionary as candidates for the speech recognition result are not limited to the above example. For example, the text array may be a kana / kanji character string. The text array may be a word and a sentence (phrase).

また、本発明が適用される装置は、上述の例に限定されない。本発明は、タブレット端末、パーソナルコンピュータ等、各種の電子機器に適用することができる。 The apparatus to which the present invention is applied is not limited to the above-described example. The present invention can be applied to various electronic devices such as a tablet terminal and a personal computer.

本発明は、比較的長い語句に対しても、高精度かつ高速な音声テキスト入力を実現することができる音声認識装置および音声認識結果確定方法として有用である。 INDUSTRIAL APPLICABILITY The present invention is useful as a speech recognition apparatus and a speech recognition result determination method that can realize high-accuracy and high-speed speech text input even for relatively long words.

１００音声認識装置
１１０音声認識データベース
１１１音響モデル
１１２言語モデル
１１３辞書
１２０マーカ挿入部
１３０音声入力部
１４０音声認識処理部
１４１音響分析部
１４２認識デコーダ部
１５０表示部
１６０操作入力部
１７０確定処理部
１８０確定結果使用部 DESCRIPTION OF SYMBOLS 100 Speech recognition apparatus 110 Speech recognition database 111 Acoustic model 112 Language model 113 Dictionary 120 Marker insertion part 130 Speech input part 140 Speech recognition process part 141 Acoustic analysis part 142 Recognition decoder part 150 Display part 160 Operation input part 170 Confirmation process part 180 Confirmation Results use department

Claims

A speech recognition database that stores a dictionary describing multiple text arrays;
A voice input unit for inputting voice;
A voice recognition processor to I line speech recognition, the result provisional speech recognition results of speech recognition on the audio portion which has already been input among the speech,
When the provisional speech recognition result is an exclusive text array having an exclusive section that is unique among the plurality of text arrays ahead, the confirmation processing unit that uses the exclusive text array as a confirmed speech recognition result for the speech and, the possess,
The voice recognition processing unit repeatedly performs the voice recognition at a predetermined cycle,
The confirmation processing unit confirms that the provisional speech recognition result is the exclusive text array and is forward-matched with all of the predetermined number of other provisional speech recognition results obtained consecutively most recently. As a condition, the exclusive text array is the final speech recognition result,
The predetermined number is two;
Voice recognition device.

A speech recognition database that stores a dictionary describing multiple text arrays;
A voice input unit for inputting voice;
A voice recognition processing unit that performs voice recognition on a voice part that has already been input in the voice and sets the voice recognition result as a provisional voice recognition result;
When the provisional speech recognition result is an exclusive text array having an exclusive section that is unique among the plurality of text arrays ahead, the confirmation processing unit that uses the exclusive text array as a confirmed speech recognition result for the speech And having
The voice recognition processing unit repeatedly performs the voice recognition at a predetermined cycle,
The confirmation processing unit confirms that the provisional speech recognition result is the exclusive text array and is forward-matched with all of the predetermined number of other provisional speech recognition results obtained consecutively most recently. As a condition, the exclusive text array is the final speech recognition result,
The text array is a syllable text array;
The confirmation processing unit further excludes the text arrangement that does not coincide with the exclusive text arrangement in the predetermined number of syllables or more from the judgment of the condition.
Voice recognition device.

A speech recognition database that stores a dictionary describing multiple text arrays;
A voice input unit for inputting voice;
A voice recognition processing unit that performs voice recognition on a voice part that has already been input in the voice and sets the voice recognition result as a provisional voice recognition result;
When the provisional speech recognition result is an exclusive text array having an exclusive section that is unique among the plurality of text arrays ahead, the confirmation processing unit that uses the exclusive text array as a confirmed speech recognition result for the speech And having
The voice recognition processing unit repeatedly performs the voice recognition at a predetermined cycle,
The confirmation processing unit confirms that the provisional speech recognition result is the exclusive text array and is forward-matched with all of the predetermined number of other provisional speech recognition results obtained consecutively most recently. As a condition, the exclusive text array is the final speech recognition result,
The exclusive text array is a syllable text array in which a marker indicating the end position of the exclusive section is inserted,
Of the plurality of text arrays, the text array other than the exclusive text array is a syllable text array in which a marker indicating the end position of the text array is inserted,
When the provisional speech recognition result is one in which the marker is not inserted, the confirmation processing unit excludes the provisional speech recognition result from the determination of the condition.
Voice recognition device.

A speech recognition database that stores a dictionary describing multiple text arrays;
A voice input unit for inputting voice;
A voice recognition processing unit that performs voice recognition on a voice part that has already been input in the voice and sets the voice recognition result as a provisional voice recognition result;
When the provisional speech recognition result is an exclusive text array having an exclusive section that is unique among the plurality of text arrays ahead, the confirmation processing unit that uses the exclusive text array as a confirmed speech recognition result for the speech A speech recognition device having:
The voice recognition processing unit repeatedly performs the voice recognition at a predetermined cycle,
The confirmation processing unit confirms that the provisional speech recognition result is the exclusive text array and is forward-matched with all of the predetermined number of other provisional speech recognition results obtained consecutively most recently. As a condition, the exclusive text array is the final speech recognition result,
The speech recognition apparatus sorts a plurality of text arrays described in the dictionary in the order of front matching, and performs a process of determining a front matching range for each of the adjacent text array pairs. A marker insertion unit for inserting a marker into the exclusive text array.
Voice recognition device.

A display unit for displaying the provisional voice recognition result;
An operation input unit that receives a determination operation on the provisional voice recognition result displayed on the display unit;
The confirmation processing unit
When the determination operation is performed, the provisional speech recognition result on which the determination operation is performed is the final speech recognition result, and when the provisional speech recognition result is the exclusive text array, the determination operation is Even if it is not performed, the exclusive text array is set as the confirmed speech recognition result.
Speech recognition apparatus according to claim 1.

The voice recognition processing unit
For each provisional speech recognition result, calculate a certainty factor indicating the certainty of the provisional speech recognition result,
The confirmation processing unit
The provisional speech recognition result with the certainty factor less than a predetermined value is excluded from the determination of the condition.
Speech recognition apparatus according to claim 1.

A speech recognition database storing a dictionary describing a plurality of text arrays; a speech input unit for inputting speech; and speech recognition for speech portions already input in the speech, and results of speech recognition A speech recognition result determination method in a speech recognition device having a provisional speech recognition result,
Determining whether the provisional speech recognition result is an exclusive text array having an exclusive section that is unique among the plurality of text arrays ahead;
When the provisional speech recognition result is the exclusive text array, the exclusive text array is set as a confirmed speech recognition result for the speech, and
The voice recognition processing unit repeatedly performs the voice recognition at a predetermined cycle,
The step of setting the final speech recognition result includes the provisional speech recognition result being the exclusive text array, and a predetermined number of the other provisional speech recognition results obtained in succession and the front. On the condition that they match, the exclusive text array is set as the confirmed speech recognition result,
The predetermined number is two;
Voice recognition result confirmation method .

A speech recognition database storing a dictionary describing a plurality of text arrays; a speech input unit for inputting speech; and speech recognition for speech portions already input in the speech, and results of speech recognition A speech recognition result determination method in a speech recognition device having a provisional speech recognition result,
Determining whether the provisional speech recognition result is an exclusive text array having an exclusive section that is unique among the plurality of text arrays ahead;
When the provisional speech recognition result is the exclusive text array, the exclusive text array is set as a confirmed speech recognition result for the speech, and
The voice recognition processing unit repeatedly performs the voice recognition at a predetermined cycle,
The step of setting the final speech recognition result includes the provisional speech recognition result being the exclusive text array, and a predetermined number of the other provisional speech recognition results obtained in succession and the front. On the condition that they match, the exclusive text array is set as the confirmed speech recognition result,
The text array is a syllable text array;
In the step of determining the voice recognition result, the text array that does not coincide with the exclusive text array in the predetermined number or more of syllables is excluded from the determination of the condition.
Voice recognition result confirmation method.

A speech recognition database storing a dictionary describing a plurality of text arrays; a speech input unit for inputting speech; and speech recognition for speech portions already input in the speech, and results of speech recognition A speech recognition result determination method in a speech recognition device having a provisional speech recognition result,
Determining whether the provisional speech recognition result is an exclusive text array having an exclusive section that is unique among the plurality of text arrays ahead;
When the provisional speech recognition result is the exclusive text array, the exclusive text array is set as a confirmed speech recognition result for the speech, and
The voice recognition processing unit repeatedly performs the voice recognition at a predetermined cycle,
The step of setting the final speech recognition result includes the provisional speech recognition result being the exclusive text array, and a predetermined number of the other provisional speech recognition results obtained in succession and the front. On the condition that they match, the exclusive text array is set as the confirmed speech recognition result,
The exclusive text array is a syllable text array in which a marker indicating the end position of the exclusive section is inserted,
Of the plurality of text arrays, the text array other than the exclusive text array is a syllable text array in which a marker indicating the end position of the text array is inserted,
The step of setting the final voice recognition result is such that when the provisional voice recognition result is one in which the marker is not inserted, the provisional voice recognition result is excluded from the determination of the condition.
Voice recognition result confirmation method.

A speech recognition database storing a dictionary describing a plurality of text arrays; a speech input unit for inputting speech; and speech recognition for speech portions already input in the speech, and results of speech recognition A speech recognition result determination method in a speech recognition device having a provisional speech recognition result,
Determining whether the provisional speech recognition result is an exclusive text array having an exclusive section that is unique among the plurality of text arrays ahead;
When the provisional speech recognition result is the exclusive text array, the exclusive text array is set as a confirmed speech recognition result for the speech, and
The voice recognition processing unit repeatedly performs the voice recognition at a predetermined cycle,
The step of setting the final speech recognition result includes the provisional speech recognition result being the exclusive text array, and a predetermined number of the other provisional speech recognition results obtained in succession and the front. On the condition that they match, the exclusive text array is set as the confirmed speech recognition result,
The speech recognition result determination method performs a process of sorting a plurality of text arrays described in the dictionary in a front matching order and determining a front matching range for each of the adjacent text array pairs. Further comprising inserting a marker into the exclusive text array.
Voice recognition result confirmation method.