JP5682578B2

JP5682578B2 - Speech recognition result correction support system, speech recognition result correction support method, and speech recognition result correction support program

Info

Publication number: JP5682578B2
Application number: JP2012015415A
Authority: JP
Inventors: 貴博吉村
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2012-01-27
Filing date: 2012-01-27
Publication date: 2015-03-11
Anticipated expiration: 2032-01-27
Also published as: JP2013156349A

Description

本発明は、誤認識部分を利用者の第二発声により修正にする音声認識結果修正支援システム、音声認識結果修正支援方法および音声認識結果修正支援プログラムに関する。 The present invention relates to a speech recognition result correction support system, a speech recognition result correction support method, and a speech recognition result correction support program that correct a misrecognition part by a second utterance of a user.

近年、スマートフォンを業務利用する流れが拡大しており、その用途は、例えば外回りの営業マンの顧客先でのプレゼンテーションや外出先での日報作成等である。しかし、スマートフォンは、キーボードが小さいことなどの要因により、日報作成のようなテキスト入力の際には、パソコンからの入力と比較して作業効率が大きく低下する。このため、スマートフォンの入力において、音声認識を活用する機運が広がっている。スマートフォンの利用者は、音声認識によりテキストを入力できれば、キーボード入力と比較して大幅な作業短縮が見込めるからである。 In recent years, the flow of business use of smartphones has been expanding, and uses thereof include, for example, presentations to customers of outside salesmen, creation of daily reports on the go, and the like. However, due to factors such as the small size of the keyboard, the work efficiency of smartphones is greatly reduced when inputting text such as daily reports compared to input from a personal computer. For this reason, in the input of a smart phone, the momentum which utilizes voice recognition has spread. This is because smartphone users can expect to significantly reduce work compared to keyboard input if they can input text by voice recognition.

なお、音声認識に用いる装置として、例えば、特許文献３に音声認識装置が開示されている。 As an apparatus used for speech recognition, for example, Patent Document 3 discloses a speech recognition apparatus.

しかし、音声認識の精度は１００％ではなく、認識に誤りが発生することがある。その場合、利用者は誤ったテキストをキーボードから修正する作業を行うことができる。したがって、誤りが多ければ多いほど修正作業が増えて効率が下がるという課題があった。このような課題に対して、誤認識を効率的に修正できる一般技術が開発されている。 However, the accuracy of voice recognition is not 100%, and an error may occur in the recognition. In that case, the user can work to correct the erroneous text from the keyboard. Therefore, there is a problem that the more errors, the more correction work and the lower the efficiency. In response to such problems, general techniques that can efficiently correct misrecognition have been developed.

そのような一般技術として、例えば、特許文献１および特許文献２に開示された技術がある。これらの技術は、誤認識テキストがある場合に、利用者が誤認識テキストに対する修正用音声（第二発声）を発声し、第二発声の認識テキストを使って誤認識テキストを置換する技術であった。 As such a general technique, for example, there are techniques disclosed in Patent Document 1 and Patent Document 2. In these technologies, when there is misrecognized text, the user utters a correction voice (second utterance) for the misrecognized text and replaces the misrecognized text using the recognized text of the second utterance. It was.

しかし、特許文献１および特許文献２に記載された発明には、第一発声で正しく認識されていたテキストであっても、第二発声で誤認識するとテキストが誤変換される課題がある。このような課題への対処として、誤認識テキストのみを利用者に正確に選択させてから第二発声させる方法が考えられる。 However, the inventions described in Patent Document 1 and Patent Document 2 have a problem that even if the text is correctly recognized by the first utterance, the text is erroneously converted if it is erroneously recognized by the second utterance. As a countermeasure for such a problem, a method of causing the user to select only the misrecognized text correctly and then making the second utterance can be considered.

特開２００１‐０９２４９３号公報Japanese Patent Laid-Open No. 2001-092493 特開２００３‐３１６３８６号公報Japanese Patent Laid-Open No. 2003-316386 特開２００２‐０９９２９６号公報JP 2002-099296 A

しかし、利用者が誤認識テキストを選択する際に、スマートフォン等の小さい画面上では正確に選択することが難しかったり、選択範囲が狭すぎて音声認識装置の誤認識が多発したりする課題がある。一般的な音声認識システムは、音声をテキストに変換する際、単語単位でテキストを生成したり、単語と単語の係り受けなどを考慮して最適なテキストを生成する。そのため、例えば、「クライド」を「クラウド」に修正するために、「イ」の部分のみを選択し第二発声するような場合、「胃」のように、かえって誤認識が多発するという課題があった。 However, when the user selects misrecognized text, there is a problem that it is difficult to select correctly on a small screen such as a smartphone, or that the selection range is too narrow and misrecognition of the speech recognition device occurs frequently. . In general speech recognition systems, when speech is converted to text, text is generated in units of words, or optimum text is generated in consideration of dependency between words. Therefore, for example, in order to correct “clyde” to “cloud”, when only “i” is selected and the second utterance is made, there is a problem that erroneous recognition frequently occurs like “stomach”. there were.

本発明は、音声認識の誤認識テキスト修正のために行われる第二発声での認識誤りを低減できる音声認識結果修正支援システムを提供することを目的とする。 An object of this invention is to provide the speech recognition result correction assistance system which can reduce the recognition error by the 2nd utterance performed for the misrecognition text correction of speech recognition.

本発明による音声認識結果修正支援システムは、利用者が発した第一発声を音声認識し、当該第一発声の音声認識に誤りがある場合、修正するための第二発声を利用者にさせる音声認識結果修正支援システムであって、利用者の発声を認識する音声認識手段と、前記認識した発声を認識テキストとして表示する音声認識テキスト表示手段と、前記認識テキストの中から利用者が選択したテキストを誤認識テキストとして取得し、予め定められたルールおよび当該誤認識テキストに基づいて、前記認識テキストの中から前記第二発声をさせる最適なテキストの範囲を判定する判定手段と、前記最適なテキストの範囲を表示し、利用者に前記第二発声を促す修正発声提示手段と、利用者が発した前記第二発声を認識し、前記第一発声の認識テキストのうち、利用者が選択した前記誤認識テキストに該当すると判断されるテキストのみを、当該テキストに該当する前記第二発声の認識テキストであって前記第一発声の認識テキストの最適なテキスト範囲に存在しないテキストで置換する認識テキスト置換手段とを備え、前記修正発声提示手段は、置換された前記認識テキストを利用者に提示することを特徴とする。 The speech recognition result correction support system according to the present invention recognizes a first utterance uttered by a user and, when there is an error in the speech recognition of the first utterance, causes the user to make a second utterance for correction. A recognition result correction support system, a speech recognition means for recognizing a user's utterance, a speech recognition text display means for displaying the recognized utterance as a recognition text, and a text selected by the user from the recognized text Is determined as a misrecognized text, and based on a predetermined rule and the misrecognized text, determination means for determining an optimal text range for causing the second utterance from the recognized text, and the optimal text A modified utterance presenting means for prompting the user to make the second utterance, recognizing the second utterance uttered by the user, and recognizing the first utterance Of, only the text which is determined to correspond to the erroneous recognition text selected by the user, the optimal range of text recognition text of a by the first utterance recognition text of the second utterance corresponding to the text Recognizing text replacing means for replacing with non-existing text , and the modified utterance presenting means presents the replaced recognized text to the user.

本発明による音声認識結果修正支援方法は、利用者が発した第一発声を音声認識し、当該第一発声の音声認識に誤りがある場合、修正するための第二発声を利用者にさせる音声認識結果修正支援方法であって、利用者の発声を認識し、前記認識した発声を認識テキストとして表示し、前記認識テキストの中から利用者が選択したテキストを誤認識テキストとして取得し、予め定められたルールおよび当該誤認識テキストに基づいて、前記認識テキストの中から前記第二発声をさせる最適なテキストの範囲を判定し、前記最適なテキストの範囲を表示し、利用者に前記第二発声を促し、利用者が発した前記第二発声を認識し、前記第一発声の認識テキストのうち、利用者が選択した前記誤認識テキストに該当すると判断されるテキストのみを、当該テキストに該当する前記第二発声の認識テキストであって前記第一発声の認識テキストの最適なテキスト範囲に存在しないテキストで置換し、置換された前記認識テキストを利用者に提示することを特徴とする。 The speech recognition result correction support method according to the present invention recognizes a first utterance uttered by a user and, when there is an error in the speech recognition of the first utterance, causes the user to make a second utterance for correction. A recognition result correction support method for recognizing a user's utterance, displaying the recognized utterance as a recognized text, obtaining a text selected by the user from the recognized text as a misrecognized text, and determining in advance Based on the determined rule and the misrecognized text, the optimum text range for the second utterance is determined from the recognized text, the optimum text range is displayed, and the second utterance is displayed to the user. And recognizing the second utterance uttered by the user, and among the recognized text of the first utterance, only the text determined to correspond to the misrecognized text selected by the user, Characterized in that replacing text that does not exist in the optimum range of text recognition text recognition text in a in the first utterance of the second utterance corresponding to the text, for presenting the recognition text substituted the user And

本発明による音声認識結果修正支援プログラムは、コンピュータに、利用者が発した第一発声を音声認識し、当該第一発声の音声認識に誤りがある場合、修正するための第二発声を利用者にさせる処理を実行させるための音声認識結果修正支援プログラムであって、コンピュータに、利用者の発声を認識する音声認識処理と、前記認識した発声を認識テキストとして表示する音声認識テキスト表示処理と、前記認識テキストの中から利用者が選択したテキストを誤認識テキストとして取得し、予め定められたルールおよび当該誤認識テキストに基づいて、前記認識テキストの中から前記第二発声をさせる最適なテキストの範囲を判定する判定処理と、前記最適なテキストの範囲を表示し、利用者に前記第二発声を促す修正発声提示処理と、利用者が発した前記第二発声を認識し、前記第一発声の認識テキストのうち、利用者が選択した前記誤認識テキストに該当すると判断されるテキストのみを、当該テキストに該当する前記第二発声の認識テキストであって前記第一発声の認識テキストの最適なテキスト範囲に存在しないテキストで置換する認識テキスト置換処理とを実行させ、前記修正発声提示処理で、置換された前記認識テキストを利用者に提示させることを特徴とする。 The speech recognition result correction support program according to the present invention recognizes a first utterance uttered by a user on a computer and, if there is an error in the speech recognition of the first utterance, a second utterance for correction. A speech recognition result correction support program for executing the processing to cause the computer to perform speech recognition processing for recognizing a user's utterance, and speech recognition text display processing for displaying the recognized utterance as recognition text, A text selected by the user from among the recognized text is acquired as a misrecognized text, and an optimal text that causes the second utterance to be uttered from the recognized text based on a predetermined rule and the misrecognized text. A determination process for determining a range, a modified utterance presentation process for displaying the optimum text range and prompting the user to perform the second utterance; Recognizing the second utterance uttered by the user, and among the recognized texts of the first utterance, only the second utterance corresponding to the text is determined to correspond to the misrecognized text selected by the user. A recognized text that is replaced with a text that does not exist within the optimum text range of the recognized text of the first utterance, and the user replaces the recognized text replaced by the modified utterance presentation process. It is made to present.

本発明による音声認識結果修正支援システムは、音声認識の誤認識テキスト修正のために行われる第二発声での認識誤りを低減できる。 The speech recognition result correction support system according to the present invention can reduce recognition errors in the second utterance performed for correcting erroneously recognized text in speech recognition.

本発明による音声認識結果修正支援システムの実施形態の構成を示すブロック図である。It is a block diagram which shows the structure of embodiment of the speech recognition result correction assistance system by this invention. 音声認識機能の動作を示すフローチャートである。It is a flowchart which shows operation | movement of a speech recognition function. 音声認識テキスト表示機能の動作を示すフローチャートである。It is a flowchart which shows operation | movement of a speech recognition text display function. 認識テキスト表示画面を示す説明図である。It is explanatory drawing which shows a recognition text display screen. 修正発声提示機能の動作を示すフローチャートである。It is a flowchart which shows operation | movement of a correction speech presentation function. 選択部単語判定機能の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the selection part word determination function. 選択部連続フレーズ判定機能の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the selection part continuous phrase determination function. 認識テキスト置換機能の動作を示すフローチャートである。It is a flowchart which shows operation | movement of a recognition text replacement function. 音声認識結果ＤＢに記憶された音声認識テキストテーブルを示す説明図である。It is explanatory drawing which shows the speech recognition text table memorize | stored in speech recognition result DB. 音声認識結果ＤＢに記憶された音声認識単語品詞テーブルを示す説明図である。It is explanatory drawing which shows the speech recognition word part of speech table memorize | stored in speech recognition result DB. フレーズ判定ルールＤＢに記憶されたフレーズ判定テーブルを示す説明図である。It is explanatory drawing which shows the phrase determination table memorize | stored in phrase determination rule DB. 本発明による音声認識結果修正支援システムの主要部を示すブロック図である。It is a block diagram which shows the principal part of the speech recognition result correction assistance system by this invention.

図１は、本発明による音声認識結果修正支援システムの実施形態の構成を示すブロック図である。図１に示すように、本実施形態の音声認識結果修正支援システムは、音声認識結果データを保存する音声認識結果データベース（ＤＢ）１０と、誤認識テキストと同一の単語や連続するフレーズを判断するためのルールを蓄積したフレーズ判定ルールデータベース（ＤＢ）２０とを備える。 FIG. 1 is a block diagram showing a configuration of an embodiment of a speech recognition result correction support system according to the present invention. As shown in FIG. 1, the speech recognition result correction support system according to the present embodiment determines a speech recognition result database (DB) 10 that stores speech recognition result data, and the same words or consecutive phrases as misrecognized text. And a phrase determination rule database (DB) 20 in which rules for storing are stored.

また、本実施形態の音声認識結果修正支援システムは、音声認識を実行する音声認識機能１１と、音声認識した結果のテキスト（認識テキスト）を表示する音声認識テキスト表示機能１２と、選択された誤認識テキストに対して、第二発声のための最適なテキストの範囲を提示し第二発声を促す修正発声提示機能１３とを備える。また、本実施形態の音声認識結果修正支援システムは、認識テキストの中から、選択された誤認識テキストを含む単語を判定する選択部単語判定機能１４と、選択された誤認識テキストと連続するフレーズを判定する選択部連続フレーズ判定機能１５とを備える。 In addition, the speech recognition result correction support system of the present embodiment includes a speech recognition function 11 that executes speech recognition, a speech recognition text display function 12 that displays text (recognition text) as a result of speech recognition, and a selected error. A modified utterance presentation function 13 that presents an optimum text range for the second utterance and prompts the second utterance for the recognized text is provided. In addition, the speech recognition result correction support system according to the present embodiment includes a selection unit word determination function 14 that determines a word including a selected erroneous recognition text from recognition text, and a phrase that is continuous with the selected erroneous recognition text. The selection part continuous phrase determination function 15 is provided.

また、本実施形態の音声認識結果修正支援システムは、第二発声の認識テキストの中で利用者が選択した第一発声の誤認識テキストに該当すると判断されるテキストのみを置換し利用者に提示する認識テキスト置換機能１６を備える。なお、上記の各機能は、プログラムに基づいて処理を実行するＣＰＵで実現可能である。また、上記の各機能は、音声認識結果ＤＢ１０およびフレーズ判定ルールＤＢ２０に接続されている。 Further, the speech recognition result correction support system according to the present embodiment replaces only the text determined to correspond to the misrecognized text of the first utterance selected by the user from the recognized text of the second utterance and presents it to the user. A recognized text replacement function 16 is provided. Each function described above can be realized by a CPU that executes processing based on a program. Each of the above functions is connected to the speech recognition result DB 10 and the phrase determination rule DB 20.

次に、動作について詳細に説明する。なお、以下の説明において、携帯機器に音声認識結果修正支援システムを搭載した構成を例にするが、そのような構成に限られない。例えば、携帯機器とネットワークを介して通信するサーバ内に音声認識結果修正支援システムを搭載し、携帯機器のマイク音声を、ネットワークを介してサーバに送信し、音声認識テキストを、ネットワークを介して携帯機器に提供するようにしてもよい。 Next, the operation will be described in detail. In the following description, a configuration in which a speech recognition result correction support system is mounted on a mobile device is taken as an example, but the configuration is not limited to such a configuration. For example, a voice recognition result correction support system is installed in a server that communicates with a mobile device via a network, the microphone voice of the mobile device is transmitted to the server via the network, and the voice recognition text is carried via the network. You may make it provide with an apparatus.

利用者が携帯機器に向かって話した音声を音声認識する手順を説明する。図２は、音声認識機能１１の動作を示すフローチャートである。図４は、認識テキスト表示画面を示す説明図である。図９は、音声認識結果ＤＢに記憶された音声認識テキストテーブルを示す説明図である。図１０は、音声認識結果ＤＢに記憶された音声認識単語品詞テーブルを示す説明図である。 A procedure for recognizing voice spoken by a user toward a mobile device will be described. FIG. 2 is a flowchart showing the operation of the voice recognition function 11. FIG. 4 is an explanatory diagram showing a recognized text display screen. FIG. 9 is an explanatory diagram showing a speech recognition text table stored in the speech recognition result DB. FIG. 10 is an explanatory diagram showing a speech recognition word part-of-speech table stored in the speech recognition result DB.

はじめに利用者は、携帯機器のマイク音声を録音する（ステップＳ１００１）。音声の録音は、例えば図４に示す画面の音声入力ボタン１３４の押し下げられると開始し、押し下げられると終了する。図４に示す画面は、携帯機器に表示される画面であり、音声認識テキスト表示機能１２が生成する画面である。なお、本実施形態では、携帯機器としてスマートフォンを想定しているが、例えば携帯電話やＰＤＡ等を用いてもよい。また、音声録音方法も上記の方法に限られない。 First, the user records the microphone sound of the portable device (step S1001). Audio recording starts when, for example, the audio input button 134 on the screen shown in FIG. 4 is pressed down and ends when the audio input button 134 is pressed down. The screen shown in FIG. 4 is a screen displayed on the mobile device, and is a screen generated by the voice recognition text display function 12. In the present embodiment, a smartphone is assumed as the mobile device, but a mobile phone, a PDA, or the like may be used, for example. The voice recording method is not limited to the above method.

次に、音声認識機能１１は、Ｓ１００１で録音した音声に対して音声認識を実行する（ステップＳ１００２）。音声認識を行うために、例えば、特許文献３に記載されているような公知の音声認識装置を、本実施形態の音声認識結果修正支援システムに搭載する。次に、音声認識機能１１は、ステップＳ１００２で音声認識して得られた認識テキストを図９に示すような音声認識テキストテーブルに登録する（ステップＳ１００３）。音声認識機能１１は、認識テキストが得られた場合、認識テキストＩＤを採番し、認識テキストＩＤとともに認識テキストを登録する。 Next, the voice recognition function 11 performs voice recognition on the voice recorded in S1001 (step S1002). In order to perform speech recognition, for example, a known speech recognition device as described in Patent Document 3 is installed in the speech recognition result correction support system of the present embodiment. Next, the speech recognition function 11 registers the recognized text obtained by performing speech recognition in step S1002 in a speech recognition text table as shown in FIG. 9 (step S1003). When the recognized text is obtained, the voice recognition function 11 assigns a recognized text ID and registers the recognized text together with the recognized text ID.

図９に示す例では、利用者が「コンサルからの情報では、競合は当社提案価格の半額以下であり、依然厳しい状況が予想される」と発声した結果、「コンばりであらの情報だが共通は当初提案価格の晩学以外であり依然厳しい状況が予算されうる」と認識されている。そして、認識テキストＩＤ「３４」が採番され、認識した内容が認識テキスト１１１に、その認識テキストＩＤとともに登録されている。 In the example shown in Fig. 9, as a result of the voice of the user saying, "Competition is less than half the price of the company's proposed price, and it is still expected that the situation will be severe." It is recognized that the situation is still budgetable because it is other than the originally proposed evening study. Then, the recognized text ID “34” is numbered, and the recognized content is registered in the recognized text 111 together with the recognized text ID.

次に、音声認識機能１１は、音声認識テキストを構成する単語と単語の品詞情報を図１０に示すような音声認識単語品詞テーブルに登録する（ステップＳ１００４）。例えば、図１０に示す音声認識結果１２１のように、認識テキストＩＤ、単語ＩＤ、認識テキストを構成する単語、単語の品詞情報、および各単語の始端文字と終端文字の認識テキスト中の出現順序が単語数分登録される。 Next, the speech recognition function 11 registers the words constituting the speech recognition text and the part of speech information of the words in the speech recognition word part of speech table as shown in FIG. 10 (step S1004). For example, like the speech recognition result 121 shown in FIG. 10, the recognition text ID, the word ID, the words constituting the recognition text, the part of speech information of the word, and the appearance order of the start character and the end character of each word in the recognition text The number of words is registered.

次に、音声認識した結果のテキストを利用者に表示する手順を説明する。図３は、音声認識テキスト表示機能１２の動作を示すフローチャートである。はじめに、音声認識テキスト表示機能１２は、音声認識テキストテーブルから認識テキストを取得する（ステップＳ２００１）。なお、ステップＳ２００１において、認識テキストＩＤ「３４」に該当するテキストが取得されたとして以下の説明をする。次に、図４に示す画面１３１のように、取得した認識テキストを画面に表示する（ステップＳ２００２）。認識テキストを修正するには、図４に示す画面１３１のように、利用者は誤認識テキストを選択して、修正対象テキストを確定する。 Next, a procedure for displaying the text of the voice recognition result to the user will be described. FIG. 3 is a flowchart showing the operation of the voice recognition text display function 12. First, the speech recognition text display function 12 acquires a recognition text from the speech recognition text table (step S2001). Note that the following description will be given assuming that the text corresponding to the recognized text ID “34” is acquired in step S2001. Next, as in the screen 131 shown in FIG. 4, the acquired recognition text is displayed on the screen (step S2002). To correct the recognized text, the user selects the misrecognized text as shown in the screen 131 shown in FIG. 4 and determines the correction target text.

次に、利用者が修正対象テキストを確定した際に、第二発声のための最適なテキストの範囲を判定し、判定したテキストを含めて利用者に第二発声を促す手順を説明する。図５は、修正発声提示機能１３の動作を示すフローチャートである。図６は、選択部単語判定機能１４の動作を示すフローチャートである。図７は、選択部連続フレーズ判定機能１５の動作を示すフローチャートである。図１１は、フレーズ判定ルールＤＢ２０に記憶されたフレーズ判定テーブルを示す説明図である。 Next, a procedure for determining an optimum text range for the second utterance when the user determines the correction target text and encouraging the user to utter the second utterance including the determined text will be described. FIG. 5 is a flowchart showing the operation of the modified utterance presentation function 13. FIG. 6 is a flowchart showing the operation of the selection unit word determination function 14. FIG. 7 is a flowchart showing the operation of the selection unit continuous phrase determination function 15. FIG. 11 is an explanatory diagram showing a phrase determination table stored in the phrase determination rule DB 20.

はじめに、修正発声提示機能１３は、利用者が誤認識テキストと判断して選択したテキスト（誤認識選択テキスト）を選択部単語判定機能１４に与える。そして、修正発声提示機能１３は、選択部単語判定機能１４が判定した、誤認識テキストを含んだ単語（誤認識単語）を取得する（図５に示すステップＳ３００１）。 First, the corrected utterance presentation function 13 provides the selection unit word determination function 14 with the text (misrecognition selection text) selected by the user as the erroneous recognition text. Then, the corrected utterance presentation function 13 acquires a word including a misrecognized text (misrecognized word) determined by the selection unit word determination function 14 (step S3001 shown in FIG. 5).

図６を用いて選択部単語判定機能１４の動作について説明する。また、以下、図４に示す画面１３１に表示された「コンばりであらの情報だが共通は当初提案価格の晩学以外であり依然厳しい状況が予算されうる」という認識テキストの中で、利用者が「ばりであ」という部分を誤認識テキストと判断し選択した場合を例として説明する。「ばりであ」という誤認識選択テキストを与えられた選択部単語判定機能１４は、認識テキストの中で「ばりであ」という部分の始端文字順と終端文字順を取得する（ステップＳ４００１）。 The operation of the selection unit word determination function 14 will be described with reference to FIG. In addition, in the following recognition text displayed on the screen 131 shown in FIG. 4, the user is in the recognition text that “the information is a burial but the common is other than the evening study of the original proposed price, and a severe situation can still be budgeted” As an example, a case where “is a flash” is selected as a misrecognized text will be described. The selection unit word determination function 14 given the erroneous recognition selection text “Bari is” acquires the start character order and the end character order of the portion “Bara” in the recognition text (step S4001).

本実施例では、選択部単語判定機能１４は、始端文字順として「３」、終端文字順として「６」を取得する。次に、選択部単語判定機能１４は、図１０に示す音声認識単語品詞テーブルの認識テキストＩＤに一致する行の中から、「始端」列が始端文字順以下の数字で最も大きい行の単語ＩＤを取得する（ステップＳ４００２）。本実施例では、「始端」列が「３」の「ばり」（単語ＩＤ「２」）が取得される。 In this embodiment, the selection unit word determination function 14 acquires “3” as the start character order and “6” as the end character order. Next, the selection unit word determination function 14 selects the word ID of the line whose “starting end” column is the largest in the order of the starting end character order from among the lines that match the recognition text ID of the speech recognition word part-of-speech table shown in FIG. Is acquired (step S4002). In this embodiment, “burr” (word ID “2”) having “3” in the “starting end” column is acquired.

次に、選択部単語判定機能１４は、図１０に示す音声認識単語品詞テーブルの認識テキストＩＤに一致する行の中から、「終端」列が終端文字順以上の数字で最も小さい行の単語ＩＤを取得する（ステップＳ４００３）。本実施例では、「終端」列が「７」の「あら」（単語ＩＤ「４」）が取得される。次に、選択部単語判定機能１４は、Ｓ４００２で取得した単語ＩＤ以上で、Ｓ４００３で取得した単語ＩＤ以下のすべての単語ＩＤに一致する単語を取得し単語を連結する（ステップＳ４００４）。本実施例では、「ばり」「で」「あら」（単語ＩＤ「２」、「３」、「４」）が取得され、連結した結果「ばりであら」が誤認識単語と判定される。 Next, the selection unit word determination function 14 selects the word ID of the smallest row in which the “terminal” column is a number greater than or equal to the terminal character order from the rows that match the recognized text ID of the speech recognition word part-of-speech table shown in FIG. Is acquired (step S4003). In the present embodiment, “ara” (word ID “4”) whose “terminal” column is “7” is acquired. Next, the selection unit word determination function 14 acquires words that are equal to or more than the word ID acquired in S4002 and match all word IDs equal to or less than the word ID acquired in S4003, and connects the words (step S4004). In this embodiment, “bari” “de” “ara” (word IDs “2”, “3”, “4”) are acquired, and the result of concatenation “barari” is determined to be a misrecognized word.

次に、修正発声提示機能１３は、選択部単語判定機能１４が判定した誤認識単語を選択部連続フレーズ判定機能１５に与え、選択部連続フレーズ判定機能１５から誤認識連続フレーズを取得する（図５に示すステップＳ３００２）。 Next, the corrected utterance presentation function 13 gives the erroneous recognition word determined by the selection unit word determination function 14 to the selection unit continuous phrase determination function 15 and acquires the erroneous recognition continuous phrase from the selection unit continuous phrase determination function 15 (FIG. Step S3002 shown in FIG.

以下、図７を用いて選択部連続フレーズ判定機能１５の動作を説明する。選択部連続フレーズ判定機能１５は、選択部単語判定機能１４から「ばりであら」というテキストを与えられ、そのテキストが最大単語数に達したか判定する（ステップＳ５００１）。最大単語数とは１以上の整数であり、システムで任意に設定可能な値である。本実施例では「６」に設定してあるものとして説明する。「ばりであら」は３単語で構成されているので、ステップＳ５００２に進み、図１０に示すテキスト始端単語の品詞情報を音声認識単語品詞テーブルから取得する（ステップＳ５００２）。テキスト始端単語「ばり」の品詞情報は「名詞」であるので「名詞」を取得する。 Hereinafter, operation | movement of the selection part continuous phrase determination function 15 is demonstrated using FIG. The selection unit continuous phrase determination function 15 is provided with the text “bararira” from the selection unit word determination function 14, and determines whether the text has reached the maximum number of words (step S5001). The maximum number of words is an integer of 1 or more, and is a value that can be arbitrarily set by the system. In this embodiment, the description will be made assuming that “6” is set. Since “barari dara” is composed of three words, the process proceeds to step S5002, and the part-of-speech information of the text start word shown in FIG. 10 is acquired from the speech recognition word part-of-speech table (step S5002). Since the part-of-speech information of the text start word “bari” is “noun”, “noun” is acquired.

次に、選択部連続フレーズ判定機能１５は、図１１に示すフレーズ判定テーブルから「対象品詞」列が始端単語の品詞で、かつ「順序」列が「前」に該当する行の接続品詞を取得する（ステップＳ５００３）。本実施例では、図１１の行１４１が該当するので「名詞」が取得される。次に、選択部連続フレーズ判定機能１５は、テキスト始端単語の一つ前の単語の品詞を音声認識単語品詞テーブルから取得する（ステップＳ５００４）。本実施例では、「テキスト始端単語の一つ前の単語」は「コン」であるため、「品詞」は「名詞」となる。 Next, the selection unit continuous phrase determination function 15 obtains the connected part of speech of the row in which the “target part of speech” column is the part of speech of the starting word and the “order” column is “previous” from the phrase determination table shown in FIG. (Step S5003). In this embodiment, since the row 141 in FIG. 11 corresponds, “noun” is acquired. Next, the selection unit continuous phrase determination function 15 acquires the part of speech of the word immediately before the text start word from the speech recognition word part of speech table (step S5004). In this embodiment, “the word immediately before the first word of the text” is “con”, so “part of speech” is “noun”.

次に、選択部連続フレーズ判定機能１５は、ステップＳ５００４で取得した「テキスト始端単語の一つ前の単語」の品詞がＳ５００３で取得した接続品詞に一致するか判定する（ステップＳ５００５）。一致するので、テキストの先頭にＳ５００４で取得した単語を連結する（ステップＳ５００６）。その結果、テキストは「コンばりであら」となる。 Next, the selection unit continuous phrase determination function 15 determines whether or not the part of speech of “the word immediately before the text start word” acquired in step S5004 matches the connection part of speech acquired in S5003 (step S5005). Since they match, the word acquired in S5004 is connected to the beginning of the text (step S5006). As a result, the text becomes “Conbara”.

次に、選択部連続フレーズ判定機能１５は、テキストが最大単語数に達したか判定する（ステップＳ５００７）。４単語なので、ステップＳ５００８に進み、テキスト終端単語の品詞情報を音声認識単語品詞テーブルから取得する。図１０に示すように、テキスト終端単語「あら」の品詞情報は「名詞」である。次に、フレーズ判定テーブルから「対象品詞」列が終端単語の品詞で、かつ「順序」列が「後」に該当する接続品詞を取得する（ステップＳ５００９）。本実施例では、図１１に示すフレーズ判定テーブルの行１４２、行１４３および行１４４が該当するので「名詞」、「助詞」および「助動詞」が取得される。 Next, the selection part continuous phrase determination function 15 determines whether the text has reached the maximum number of words (step S5007). Since there are four words, the process proceeds to step S5008, and the part of speech information of the text end word is acquired from the speech recognition word part of speech table. As shown in FIG. 10, the part-of-speech information of the text end word “ARA” is “noun”. Next, the connected part-of-speech in which the “target part-of-speech” column is the part-of-speech of the terminal word and the “order” column is “after” is acquired from the phrase determination table (step S5009). In this embodiment, since the lines 142, 143, and 144 of the phrase determination table shown in FIG. 11 correspond, “noun”, “particle”, and “auxiliary verb” are acquired.

次に、選択部連続フレーズ判定機能１５は、テキスト終端単語の一つ後の単語の品詞を音声認識単語品詞テーブルから取得する（ステップＳ５０１０）。「テキスト終端単語の一つ後の単語」は「の」であり、「品詞」は「助詞」である。次に、ステップＳ５０１０で取得した「テキスト終端単語の一つ後の単語」の品詞が、Ｓ５００９で取得した接続品詞に一致するか判定する（ステップＳ５０１１）。一致するので、テキストの末尾にＳ５０１０で取得した単語を連結する（ステップＳ５０１２）。その結果、テキストは「コンばりであらの」となる。 Next, the selection unit continuous phrase determination function 15 acquires the part of speech of the word immediately after the text end word from the speech recognition word part of speech table (step S5010). “The word immediately after the end-of-text word” is “no”, and “part of speech” is “particle”. Next, it is determined whether or not the part of speech of “the word immediately after the text end word” acquired in step S5010 matches the connection part of speech acquired in S5009 (step S5011). Since they match, the word acquired in S5010 is connected to the end of the text (step S5012). As a result, the text becomes "Converse".

次に、選択部連続フレーズ判定機能１５は、テキストが最大単語数に達したか判定する（ステップＳ５０１３）。５単語なので、テキスト終端単語の品詞情報を音声認識単語品詞テーブルから取得する（ステップＳ５００８）。テキスト終端単語「の」の品詞情報は「助詞」である。次に、選択部連続フレーズ判定機能１５は、フレーズ判定テーブルから「対象品詞」列が終端単語の品詞で、かつ「順序」列が「後」に該当する接続品詞を取得する（ステップＳ５００９）。本実施例では、図１１の行１４５および行１４６が該当するので「名詞」および「動詞」が取得される。 Next, the selection part continuous phrase determination function 15 determines whether the text has reached the maximum number of words (step S5013). Since there are five words, the part of speech information of the text end word is acquired from the speech recognition word part of speech table (step S5008). The part-of-speech information of the end-of-text word “no” is “particle”. Next, the selection unit continuous phrase determination function 15 acquires from the phrase determination table a connected part of speech in which the “target part of speech” column is the part of speech of the last word and the “order” column is “after” (step S5009). In this embodiment, since the row 145 and the row 146 in FIG. 11 correspond, “noun” and “verb” are acquired.

次に、選択部連続フレーズ判定機能１５は、テキスト終端単語の一つ後の単語の品詞を音声認識単語品詞テーブルから取得する（ステップＳ５０１０）。「テキスト終端単語の一つ後の単語」は「情報」であり、「品詞」は「名詞」である。次に、選択部連続フレーズ判定機能１５は、Ｓ５０１０で取得した「テキスト終端単語の一つ後の単語」の品詞がステップＳ５００９で取得した接続品詞に一致するか判定する（ステップＳ５０１１）。一致するので、テキストの末尾にＳ５０１０で取得した単語を連結する（ステップＳ５０１２）。その結果、テキストは「コンばりであらの情報」となる。 Next, the selection unit continuous phrase determination function 15 acquires the part of speech of the word immediately after the text end word from the speech recognition word part of speech table (step S5010). “The word immediately after the end-of-text word” is “information”, and “part of speech” is “noun”. Next, the selection unit continuous phrase determination function 15 determines whether the part of speech of “the word immediately after the text end word” acquired in S5010 matches the connection part of speech acquired in Step S5009 (Step S5011). Since they match, the word acquired in S5010 is connected to the end of the text (step S5012). As a result, the text becomes “other information in the context”.

次に、選択部連続フレーズ判定機能１５は、テキストが最大単語数に達したか判定する（ステップＳ５０１３）。テキストは６単語であり最大単語数６に一致するので、「コンばりであらの情報」が、第二発声のための最適なテキストの範囲である誤認識連続フレーズと判定される。 Next, the selection part continuous phrase determination function 15 determines whether the text has reached the maximum number of words (step S5013). Since the text is 6 words and coincides with the maximum number of words 6, the “information by using the konburari” is determined to be a misrecognition continuous phrase that is the optimum text range for the second utterance.

修正発声提示機能１３は、誤認識連続フレーズを強調表示し、強調表示部に対する発声を促すメッセージを画面に表示させる（図５に示すステップＳ３００３）（図４に示す画面１３２）。 The corrected utterance presentation function 13 highlights the misrecognized continuous phrases, and displays a message prompting the utterance to the highlighted display section (step S3003 shown in FIG. 5) (screen 132 shown in FIG. 4).

次に、第二発声の認識テキストを用いて、利用者が選択した第一発声の誤認識テキストに該当すると判断されるテキストのみを置換し修正結果を利用者に提示する手順を説明する。図８は、認識テキスト置換機能１６の動作を示すフローチャートである。本実施例では「コンサルからの情報」と利用者が第二発声を行い、その結果「コンサルからの情報」と認識テキストが取得されたとする。 Next, a description will be given of a procedure for replacing only the text determined to correspond to the misrecognized text of the first utterance selected by the user using the recognized text of the second utterance and presenting the correction result to the user. FIG. 8 is a flowchart showing the operation of the recognized text replacement function 16. In this embodiment, it is assumed that the user “utters information” and makes a second utterance, and as a result, “information from the consult” and the recognized text are acquired.

はじめに、認識テキスト置換機能１６は、第二発声を音声認識機能１１に渡し認識テキストを取得する（ステップＳ６００１）。次に、第一発声の誤認識連続フレーズ「コンばりであらの情報」を取得する（ステップＳ６００２）。次に、順序変数ｎに１をセットする（ステップＳ６００３）。「順序変数ｎ」は、１以上の整数を格納した変数であり、認識テキスト置換機能１６の中で用いられる値である。次に、第一発声の誤認識連続フレーズにｎ番目の単語が存在するか判定する（ステップＳ６００４）。 First, the recognized text replacement function 16 passes the second utterance to the speech recognition function 11 and acquires the recognized text (step S6001). Next, the first recognition misrecognition continuous phrase “information of the constellation” is acquired (step S6002). Next, 1 is set to the order variable n (step S6003). The “order variable n” is a variable storing an integer of 1 or more, and is a value used in the recognized text replacement function 16. Next, it is determined whether or not the nth word exists in the erroneous recognition continuous phrase of the first utterance (step S6004).

１番目の単語「コン」が存在するので、認識テキスト置換機能１６は、その単語を取得する（ステップＳ６００５）。次に、ｎ番目の単語が、利用者が誤認識として選択したテキストであるか判定する（ステップＳ６００６）。「コン」は誤認識テキストとして選択されていないので、順序変数ｎに１を加算する（ステップＳ６００９）。この段階で順序変数ｎは２になる。次に、誤認識連続フレーズの２番目の単語が存在するので（ステップＳ６００４）、その単語「ばり」を取得（ステップＳ６００５）し誤認識として選択したテキストであるか判定する（ステップＳ６００６）。 Since the first word “con” exists, the recognized text replacement function 16 acquires the word (step S6005). Next, it is determined whether the nth word is a text selected as a misrecognition by the user (step S6006). Since “con” is not selected as a misrecognized text, 1 is added to the order variable n (step S6009). At this stage, the order variable n becomes 2. Next, since the second word of the misrecognized continuous phrase exists (step S6004), the word “burari” is acquired (step S6005), and it is determined whether the text is selected as the misrecognition (step S6006).

「ばり」というテキストは、誤認識として選択されているので、認識テキスト置換機能１６は、第二発声の認識結果の２番目の単語「サル」が、第一発声の誤認識連続フレーズに存在せず、かつ置換単語変数に存在しているかどうか判定する。置換単語変数とは、第二発声の認識テキストの中で、第一発声の誤認識テキストに該当すると判断される単語を格納するための変数である。「サル」は、第一発声の誤認識連続フレーズに存在せず、かつ置換単語変数に存在していないため、置換単語変数に追加する（ステップＳ６００７）。 Since the text “Bari” has been selected as a misrecognition, the recognized text replacement function 16 does not allow the second word “monkey” of the recognition result of the second utterance to be present in the misrecognized continuous phrase of the first utterance. And whether it is present in the replacement word variable. The replacement word variable is a variable for storing a word determined to correspond to the misrecognized text of the first utterance in the recognized text of the second utterance. Since “monkey” does not exist in the erroneous recognition continuous phrase of the first utterance and does not exist in the replacement word variable, it is added to the replacement word variable (step S6007).

次に、認識テキスト置換機能１６は、第二発声の認識結果の３番目の単語「から」が、第一発声の誤認識連続フレーズに存在せず、かつ置換単語変数に存在しているかどうか判定する。「から」は、第一発声の誤認識連続フレーズに存在せず、かつ置換単語変数に存在していないため、置換単語変数に追加する（ステップＳ６００８）。これで置換単語変数は「サルから」となる。次に、順序変数ｎに１を加算する（ステップＳ６００９）。この段階で順序変数ｎは３になる。 Next, the recognized text replacement function 16 determines whether or not the third word “kara” of the recognition result of the second utterance is not present in the erroneous recognition continuous phrase of the first utterance and is present in the replacement word variable. To do. Since “from” does not exist in the erroneous recognition continuous phrase of the first utterance and does not exist in the replacement word variable, it is added to the replacement word variable (step S6008). The replacement word variable is now “from monkey”. Next, 1 is added to the order variable n (step S6009). At this stage, the order variable n becomes 3.

次に、認識テキスト置換機能１６は、誤認識連続フレーズの３番目の単語が存在するので（ステップＳ６００４）、その単語「で」を取得（ステップＳ６００５）し、誤認識として選択したテキストであるか判定する（ステップＳ６００６）。「で」は誤認識テキストとして選択されているので、第二発声の認識結果の３番目の単語「から」が、第一発声の誤認識連続フレーズに存在せず、かつ置換単語変数に存在しているかどうか判定する。「から」は置換単語変数に存在しているので、追加されない。 Next, since the third word of the misrecognized continuous phrase is present (step S6004), the recognized text replacement function 16 acquires the word “de” (step S6005) and determines whether the text is selected as a misrecognition. Determination is made (step S6006). Since “de” is selected as the misrecognized text, the third word “from” of the recognition result of the second utterance is not present in the misrecognition continuous phrase of the first utterance and is present in the replacement word variable. Judge whether or not. “From” is not added because it exists in the substitution word variable.

次に、認識テキスト置換機能１６は、第二発声の認識結果の４番目の単語「の」が、第一発声の誤認識連続フレーズに存在せず、かつ置換単語変数に存在しているかどうか判定する。「の」は第一発声の誤認識連続フレーズに存在しているので、追加は行わない（ステップＳ６００８）。置換単語変数は「サルから」のままである。次に、認識テキスト置換機能１６は、順序変数ｎに１を加算する（ステップＳ６００９）。この段階で順序変数ｎは４になる。 Next, the recognized text replacement function 16 determines whether the fourth word “no” of the recognition result of the second utterance is not present in the erroneous recognition continuous phrase of the first utterance and is present in the replacement word variable. To do. Since “no” is present in the erroneous recognition continuous phrase of the first utterance, no addition is performed (step S6008). The replacement word variable remains “from monkey”. Next, the recognized text replacement function 16 adds 1 to the order variable n (step S6009). At this stage, the order variable n becomes 4.

次に、誤認識連続フレーズの４番目の単語が存在するので（ステップＳ６００４）、認識テキスト置換機能１６は、その単語「あら」を取得（ステップＳ６００５）し誤認識として選択したテキストであるか判定する（ステップＳ６００６）。「あら」は誤認識テキストとして選択されているので、第二発声の認識結果の４番目の単語「の」が、第一発声の誤認識連続フレーズに存在せず、かつ置換単語変数に存在しているかどうか判定する。「の」は第一発声の誤認識連続フレーズに存在しているので、追加されない。 Next, since the fourth word of the misrecognized continuous phrase exists (step S6004), the recognized text replacement function 16 obtains the word “ara” (step S6005) and determines whether the text is selected as a misrecognition. (Step S6006). Since “ara” is selected as the misrecognized text, the fourth word “no” of the recognition result of the second utterance is not present in the erroneous recognition continuous phrase of the first utterance and is present in the replacement word variable. Judge whether or not. “No” is not added because it is present in the misrecognized continuous phrase of the first utterance.

なお、上記ステップＳ６００６の判定において、利用者が選択した誤認識テキストは「ばりであ」であるので、「あら」の一部しか含まない。ただし、「あら」は、選択部単語判定機能１４が判定した誤認識単語「ばりであら」に含まれるので、ステップＳ６００６の判定においては、誤認識テキストとみなして判定する。 Note that, in the determination in step S6006, the misrecognized text selected by the user is “bari”, and therefore includes only part of “ar”. However, “ara” is included in the misrecognized word “barari dara” determined by the selection unit word determining function 14, and therefore, in the determination in step S6006, it is determined as misrecognized text.

次に、認識テキスト置換機能１６は、第二発声の認識結果の５番目の単語「情報」が、第一発声の誤認識連続フレーズに存在せず、かつ置換単語変数に存在しているかどうか判定する。「情報」は、第一発声の誤認識連続フレーズに存在しているので、追加されない。置換単語変数は「サルから」のままである。次に、順序変数ｎに１を加算する（ステップＳ６００９）。この段階で順序変数ｎは５になる。 Next, the recognized text replacement function 16 determines whether the fifth word “information” of the recognition result of the second utterance is not present in the erroneous recognition continuous phrase of the first utterance and is present in the replacement word variable. To do. Since “information” exists in the erroneous recognition continuous phrase of the first utterance, it is not added. The replacement word variable remains “from monkey”. Next, 1 is added to the order variable n (step S6009). At this stage, the order variable n becomes 5.

次に、誤認識連続フレーズの５番目の単語が存在するので（ステップＳ６００４）、認識テキスト置換機能１６は、その単語「の」を取得（ステップＳ６００５）し、誤認識として選択したテキストであるか判定する（ステップＳ６００６）。「の」は誤認識テキストとして選択されていないので、順序変数ｎに１を加算する（ステップＳ６００９）。この段階で順序変数ｎは６になる。 Next, since there is the fifth word of the misrecognized continuous phrase (step S6004), the recognized text replacement function 16 obtains the word “no” (step S6005), and is the text selected as a misrecognition? Determination is made (step S6006). Since “no” is not selected as a misrecognized text, 1 is added to the order variable n (step S6009). At this stage, the order variable n becomes 6.

次に、認識テキスト置換機能１６は、誤認識連続フレーズの６番目の単語が存在するので（ステップＳ６００４）、その単語「情報」を取得し（ステップＳ６００５）、誤認識として選択したテキストであるか判定する（ステップＳ６００６）。「情報」は誤認識テキストとして選択されていないので、順序変数ｎに１を加算する（ステップＳ６００９）。この段階で順序変数ｎは７になる。 Next, the recognized text replacement function 16 obtains the word “information” (step S6005) because the sixth word of the misrecognized continuous phrase exists (step S6004), and is the text selected as the erroneous recognition? Determination is made (step S6006). Since “information” is not selected as a misrecognized text, 1 is added to the order variable n (step S6009). At this stage, the order variable n becomes 7.

次に、認識テキスト置換機能１６は、第一発声の誤認識連続フレーズに７番目の単語が存在しない（ステップＳ６００４）ので、第一発声の誤認識単語「ばりであら」を置換単語変数「サルから」に置換し画面に表示する（ステップＳ６０１０）（図４の画面１３３）。 Next, since the seventh word does not exist in the erroneous recognition continuous phrase of the first utterance (step S6004), the recognized text replacement function 16 replaces the erroneous recognition word “barariara” of the first utterance with the replacement word variable “monkey”. Is replaced with "" and displayed on the screen (step S6010) (screen 133 in FIG. 4).

本実施形態の音声認識結果修正支援システムによれば、音声認識の誤認識テキスト修正のために誤認識部を利用者が選択する際に、修正発声のための最適なテキスト範囲をシステムが提示する。そのため、第二発声での認識誤りを低減できるので、スマートフォンなどテキスト選択操作に手間のかかる端末での修正作業が軽減される。 According to the speech recognition result correction support system of this embodiment, when the user selects a misrecognition unit for correcting misrecognized text in speech recognition, the system presents an optimal text range for corrective utterance. . Therefore, since recognition errors in the second utterance can be reduced, correction work on a terminal such as a smartphone that takes time to select a text is reduced.

また、本実施形態の音声認識結果修正支援システムによれば、誤認識テキスト部のみ、修正発声の認識テキストで自動修正する。そのため、修正発声時の誤認識の影響を受けず第二発声での認識誤りを低減できるので、修正作業が軽減される。 Further, according to the speech recognition result correction support system of the present embodiment, only the erroneously recognized text portion is automatically corrected with the recognized text of the corrected utterance. For this reason, the recognition error in the second utterance can be reduced without being affected by the erroneous recognition during the correction utterance, and the correction work is reduced.

図１２は、本発明による音声認識結果修正支援システムの主要部を示すブロック図である。図１２に示すように本発明による音声認識結果修正支援システムは、利用者が発した第一発声を音声認識し、第一発声の音声認識に誤りがある場合、修正するための第二発声を利用者にさせる音声認識結果修正支援システムであって、利用者の発声を認識する音声認識手段１を備える。また、本発明による音声認識結果修正支援システムは、認識した発声を認識テキストとして表示する音声認識テキスト表示手段２と、認識テキストの中から利用者が選択したテキストを誤認識テキストとして取得し、予め定められたルールおよび誤認識テキストに基づいて、認識テキストの中から第二発声をさせる最適なテキストの範囲を判定する判定手段３と、最適なテキストの範囲を表示し、利用者に第二発声を促す修正発声提示手段４とを備える。 FIG. 12 is a block diagram showing a main part of the speech recognition result correction support system according to the present invention. As shown in FIG. 12, the speech recognition result correction support system according to the present invention recognizes the first utterance uttered by the user and, if there is an error in the speech recognition of the first utterance, outputs the second utterance for correction. A speech recognition result correction support system for a user, comprising speech recognition means 1 for recognizing a user's utterance. Further, the speech recognition result correction support system according to the present invention acquires speech recognition text display means 2 for displaying a recognized utterance as recognition text, acquires a text selected by the user from the recognition text as misrecognized text, and Based on the determined rule and the misrecognized text, the determination means 3 for determining the optimum text range for causing the second utterance from the recognized text, the optimum text range is displayed, and the second utterance is displayed to the user. Modified utterance presentation means 4 for prompting

また、上記の実施形態では、以下の（１）〜（４）に示すような音声認識結果修正支援システムも開示されている。 Moreover, in said embodiment, the speech recognition result correction assistance system as shown to the following (1)-(4) is also disclosed.

（１）音声認識結果修正支援システムは、利用者が発した第一発声を音声認識し、第一発声の音声認識に誤りがある場合、修正するための第二発声を利用者にさせる音声認識結果修正支援システムであって、利用者の発声を認識する音声認識手段（例えば、音声認識機能１１）と、認識した発声を認識テキストとして表示する音声認識テキスト表示手段（例えば、音声認識テキスト表示機能１２）と、認識テキストの中から利用者が選択したテキストを誤認識テキストとして取得し、予め定められたルールおよび誤認識テキストに基づいて、認識テキストの中から第二発声をさせる最適なテキストの範囲を判定する判定手段（例えば、選択部単語判定機能１４および選択部連続フレーズ判定機能１５）と、最適なテキストの範囲を表示し、利用者に第二発声を促す修正発声提示手段（例えば、修正発声提示機能１３）とを備える。 (1) The speech recognition result correction support system recognizes the first utterance uttered by the user and, when there is an error in the speech recognition of the first utterance, makes the user recognize the second utterance for correction. In the result correction support system, voice recognition means (for example, voice recognition function 11) for recognizing a user's utterance and voice recognition text display means (for example, voice recognition text display function) for displaying the recognized utterance as recognition text 12), the text selected by the user from the recognized text is acquired as the misrecognized text, and the optimum text for making the second utterance from the recognized text based on the predetermined rule and the misrecognized text. The determination means for determining the range (for example, the selection unit word determination function 14 and the selection unit continuous phrase determination function 15) and the optimal text range are displayed. Who in and a modified utterance presentation means urging the second utterance (e.g., modified utterance presentation function 13).

（２）音声認識結果修正支援システムは、利用者が発した第二発声を認識し、第一発声の認識テキストのうち、利用者が選択した誤認識テキストに該当すると判断されるテキストを、テキストに該当する第二発声の認識テキストで置換する認識テキスト置換手段（例えば、認識テキスト置換機能１６）を備え、修正発声提示手段は、置換された認識テキストを利用者に提示するように構成されていてもよい。 (2) The speech recognition result correction support system recognizes the second utterance uttered by the user, and selects the text determined to correspond to the misrecognized text selected by the user from the recognized text of the first utterance. Recognition text replacement means (for example, the recognition text replacement function 16) for replacing with the recognition text of the second utterance corresponding to the above, and the modified utterance presentation means is configured to present the replaced recognition text to the user. May be.

（３）音声認識結果修正支援システムは、判定手段が、誤認識選択テキストの始端文字順および終端文字順を取得し、認識テキストのうち、始端文字順に対応する文字が含まれる単語から、終端文字順に対応する文字が含まれる単語までの単語を連結して誤認識単語と判定する選択部単語判定手段（例えば、選択部単語判定機能１４）を含み、誤認識単語を用いて第二発声をさせる最適なテキストの範囲を判定するように構成されていてもよい。 (3) In the speech recognition result correction support system, the determination unit obtains the start character order and the end character order of the misrecognized selection text, and from the word including the characters corresponding to the start character order in the recognized text, the end character It includes selection unit word determination means (for example, the selection unit word determination function 14) that determines a recognition error word by concatenating words up to the word including the corresponding characters in order, and causes the second utterance to be made using the erroneous recognition word. It may be configured to determine an optimal text range.

（４）音声認識結果修正支援システムは、判定手段が、誤認識単語と、認識テキストのうち誤認識単語と連続するフレーズと判断できる単語とを連結し、連結したテキストの範囲を第二発声のための最適なテキストの範囲と判定する選択連続フレーズ判定手段（例えば、選択部連続フレーズ判定機能１５）を含むように構成されていてもよい。 (4) In the speech recognition result correction support system, the determination unit connects the misrecognized word and the word that can be determined to be a phrase that is continuous with the misrecognized word in the recognized text, and sets the range of the connected text in the second utterance It may be configured to include a selected continuous phrase determining means (for example, a selection unit continuous phrase determining function 15) that determines an optimum text range for the purpose.

本発明は、スマートフォンによる日報作成等の用途に適用することができる。 The present invention can be applied to uses such as daily report creation by a smartphone.

１音声認識手段
２音声認識テキスト表示手段
３判定手段
４修正発声提示手段
１０音声認識結果ＤＢ
１１音声認識機能
１２音声認識テキスト表示機能
１３修正発声提示機能
１４選択部単語判定機能
１５選択部連続フレーズ判定機能
１６認識テキスト置換機能
２０フレーズ判定ルールＤＢ DESCRIPTION OF SYMBOLS 1 Speech recognition means 2 Speech recognition text display means 3 Determination means 4 Modified utterance presentation means 10 Speech recognition result DB
DESCRIPTION OF SYMBOLS 11 Speech recognition function 12 Speech recognition text display function 13 Correction utterance presentation function 14 Selection part word determination function 15 Selection part continuous phrase determination function 16 Recognition text replacement function 20 Phrase determination rule DB

Claims

A speech recognition result correction support system for recognizing a first utterance uttered by a user and causing the user to make a second utterance for correction when there is an error in the speech recognition of the first utterance,
Speech recognition means for recognizing user utterances;
Speech recognition text display means for displaying the recognized utterance as recognition text;
A text selected by the user from among the recognized text is acquired as a misrecognized text, and an optimal text that causes the second utterance to be uttered from the recognized text based on a predetermined rule and the misrecognized text. A determination means for determining a range;
Modified utterance presentation means for displaying the optimum text range and prompting the user to utter the second utterance;
Recognizing the second utterance uttered by the user, among the recognized texts of the first utterance, only the second text corresponding to the text is determined to correspond to the misrecognized text selected by the user. Recognizing text replacement means for substituting with text that is recognized speech and does not exist in the optimal text range of the recognized text of the first speech
The modified utterance presenting means presents the replaced recognized text to a user. A speech recognition result correction support system, wherein:

The judging means is
Get the start character order and the end character order of the misrecognized selected text, and concatenate the words from the word containing the characters corresponding to the start character order to the words containing the characters corresponding to the end character order in the recognized text. Including a selection unit word determination means for determining an erroneously recognized word,
The speech recognition result correction support system according to claim 1, wherein an optimum text range for second utterance is determined using the misrecognized word.

The judging means is
Selected consecutive phrase determination that concatenates a misrecognized word and a word that can be determined to be a phrase that is continuous with the misrecognized word in the recognized text and determines the range of the concatenated text as the optimal text range for the second utterance The speech recognition result correction support system according to claim 2, including means.

A speech recognition result correction support method for recognizing a first utterance uttered by a user and causing the user to make a second utterance for correction when there is an error in the speech recognition of the first utterance,
Recognize user utterances,
Displaying the recognized utterance as recognition text;
A text selected by the user from among the recognized text is acquired as a misrecognized text, and an optimal text that causes the second utterance to be uttered from the recognized text based on a predetermined rule and the misrecognized text. Determine the range,
Display the optimal text range, prompt the user to speak the second utterance,
Recognizing the second utterance uttered by the user, among the recognized texts of the first utterance, only the second text corresponding to the text is determined to correspond to the misrecognized text selected by the user. Replacing the utterance recognition text with text that is not in the optimal text range of the first utterance recognition text ,
Presenting the replaced recognition text to a user. A speech recognition result correction support method, comprising:

A speech recognition result for causing the computer to recognize the first utterance uttered by the user and to cause the user to execute a second utterance for correction when there is an error in the speech recognition of the first utterance A correction support program,
On the computer,
Speech recognition processing to recognize user utterances,
Speech recognition text display processing for displaying the recognized utterance as recognition text;
A text selected by the user from among the recognized text is acquired as a misrecognized text, and an optimal text that causes the second utterance to be uttered from the recognized text based on a predetermined rule and the misrecognized text. A determination process for determining a range;
A modified utterance presentation process that displays the optimal text range and prompts the user to utter the second utterance;
Recognizing the second utterance uttered by the user, among the recognized texts of the first utterance, only the second text corresponding to the text is determined to correspond to the misrecognized text selected by the user. A recognized text replacement process that replaces the recognized text of the utterance with text that does not exist in the optimal text range of the recognized text of the first utterance ,
A speech recognition result correction support program, which causes a user to present the recognized text replaced in the modified utterance presentation process.