JPH11352988A

JPH11352988A - Voice recognition device

Info

Publication number: JPH11352988A
Application number: JP16106898A
Authority: JP
Inventors: Yamato Kanda; 大和神田
Original assignee: Olympus Optical Co Ltd
Current assignee: Olympus Corp
Priority date: 1998-06-09
Filing date: 1998-06-09
Publication date: 1999-12-24

Abstract

PROBLEM TO BE SOLVED: To provide a voice recognition device capable of efficiently finding out an erroneously recognized character. SOLUTION: This voice recognition device is provided with a voice recognition part 2 for recognizing the voice data inputted from a voice data input part 1 in reference to a dictionary 3 and converting it to a test data; an erroneously recognized character estimation part 4 for determining the difference in likelihood between a recognition determined character and the character which is the second candidate in recognition determination by use of the likelihood of each character obtained from the voice recognition processing part 2 and estimating the character having a difference of a prescribed threshold or less as a character having a high erroneous recognition probability; a parameter input part 5 for setting the threshold; a display processing part 6 for performing the processing of changing and displaying the color of form of the erroneously recognized character estimated by the erroneously recognized character estimation part 4 to characters other than it; and a display part 8 for displaying the processing result by the display processing part 6.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声認識装置、よ
り詳しくは、音声データを音声認識してテキストデータ
に変換し表示する音声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition device, and more particularly, to a voice recognition device that recognizes voice data, converts the voice data into text data, and displays the text data.

【０００２】[0002]

【従来の技術】いわゆる音声ワードプロセッサ、あるい
は、口述することにより音声データを入力すると、該音
声データに基づいて自動的に文書を作成し、それを画面
等に表示するディクテーションシステムの実現は、従来
からの音声認識システム開発における一つの目標であ
り、現在、活発に研究や開発が進められている。2. Description of the Related Art A so-called voice word processor, or a dictation system that automatically creates a document based on voice data when inputting voice data by dictation and displays the document on a screen or the like, has conventionally been realized. This is one of the goals in speech recognition system development, and research and development are currently being actively pursued.

【０００３】こうした近年の音声認識技術の進歩に伴っ
て、パーソナルコンピュータにマイクロホンを接続し、
このマイクロホンを用いて入力した音声を、該パーソナ
ルコンピュータ上で文書化して画面に表示させる装置が
開発されていて、一般に市販されている。With the recent progress of speech recognition technology, a microphone is connected to a personal computer,
A device has been developed in which voice input using this microphone is documented on the personal computer and displayed on a screen, and is generally commercially available.

【０００４】一方、従来より、文書を作成するにあた
り、作成したい文書の内容を一旦テープレコーダ等の録
音装置に口述録音して、後で秘書やタイピスト等がその
口述内容を再生しながら、タイプライタやワードプロセ
ッサ等の文書作成装置により文書化する、といった形態
をとることが、テープレコーダ等の録音装置の有効な利
用形態の１つとして一般化している。On the other hand, conventionally, when a document is created, the content of the document to be created is once dictated and recorded on a recording device such as a tape recorder, and a secretary or a typist or the like later reproduces the dictated content, and a typewriter. The use of a recording device such as a tape recorder or the like has become common as one of effective forms of using a recording device such as a tape recorder.

【０００５】上述したような録音装置を用いて口述する
利用形態においては、以前から、録音内容を自動的に文
書に変換する技術の実現が強く望まれている。[0005] In the above-mentioned dictation using a recording device, there has been a strong demand for a technique for automatically converting recorded contents into a document.

【０００６】そこで、一旦、録音装置に音声を録音し、
その録音内容を連続的に再生して音声認識装置に入力
し、該音声認識装置において、連続する音声を認識して
文書に変換し、再生が終了した後にその認識結果となる
文書を画面に表示するものが提案されている。Therefore, once a voice is recorded on a recording device,
The recorded content is continuously reproduced and input to a voice recognition device. The voice recognition device recognizes the continuous voice and converts it into a document. After the reproduction is completed, a document as a result of the recognition is displayed on a screen. Something to do is suggested.

【０００７】このような音声認識装置による自動的な音
声認識では、入力された音声の全てを元の文書に正確に
変換することは難しく、変換後の文章中に誤認識文字が
含まれることになるために、使用者は変換後の文書を再
度確認して訂正作業を行う必要がある。[0007] In such automatic speech recognition by the speech recognition device, it is difficult to accurately convert all of the input speech into the original document, and the erroneously recognized characters are included in the converted text. Therefore, the user needs to confirm the converted document again and perform a correction operation.

【０００８】この訂正作業を容易に行うことができるよ
うにした技術の一例として、特公昭６３−３３１７４号
公報には、録音装置に録音された音声を入力し、その音
声を認識して文書処理を施す装置において、上記録音装
置の再生動作による入力音声の入力単位となる文節もし
くは文章の終了を検出する手段と、この検出手段により
文節もしくは文章の終了が検出された時に上記録音装置
の一時停止信号を出力する手段と、上記一時停止信号に
応答して上記録音装置の音声再生動作を一時停止させ、
且つ再スタート用の確定キーの操作に応答して該音声再
生動作の一時停止を解除する録音装置制御部とを備えた
音声入力式日本語文書処理装置が記載されている。Japanese Patent Publication No. 63-33174 discloses an example of a technique which makes it possible to easily perform this correction work. Means for detecting the end of a phrase or sentence, which is an input unit of the input voice by the reproducing operation of the recording device, and temporarily stopping the recording device when the end of the phrase or sentence is detected by the detecting means. Means for outputting a signal, and suspending the sound reproduction operation of the recording device in response to the pause signal,
Also, there is described a voice input type Japanese document processing device including a recording device control section for releasing the pause of the voice reproduction operation in response to the operation of a restart key.

【０００９】[0009]

【発明が解決しようとする課題】しかしながら、上述し
たような従来の音声認識装置では、変換後の文書を確認
して訂正する際には、使用者が文字を１つ１つ確認しな
ければならず、大きな負担を伴って労力を要していた。
このために、録音内容が多く文書が長い場合には、全て
の誤認識文字を探し出すのに非常に多くの時間を要する
ことになってしまい、作業効率が良くなかった。However, in the conventional speech recognition apparatus as described above, when checking and correcting the converted document, the user must check each character one by one. And labor was required with a heavy burden.
For this reason, when the recorded content is large and the document is long, it takes a lot of time to find all the misrecognized characters, and the work efficiency is not good.

【００１０】本発明は上記事情に鑑みてなされたもので
あり、効率的に誤認識文字を探し出すことができる音声
認識装置を提供することを目的としている。The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a speech recognition device that can efficiently search for misrecognized characters.

【００１１】[0011]

【課題を解決するための手段】上記の目的を達成するた
めに、第１の発明による音声認識装置は、入力した音声
データを音声認識してテキストデータに変換する音声認
識処理部と、この音声認識処理部による音声認識処理の
過程において生じる各文字の尤度を利用して誤認識の確
率が高い文字を推定する誤認識文字推定部と、この誤認
識文字推定部により推定された誤認識文字をそれ以外の
文字に対して視覚的に速やかに識別可能となるように上
記テキストデータを表示させるべく処理する表示処理部
と、この表示処理部による処理結果を表示する表示部と
を備えたものである。In order to achieve the above object, a voice recognition device according to a first aspect of the present invention includes a voice recognition processing unit that recognizes input voice data and converts the voice data into text data. A misrecognized character estimator for estimating a character with a high probability of misrecognition using the likelihood of each character generated in the course of speech recognition processing by the recognition processor, and a misrecognized character estimated by the misrecognized character estimator A display processing unit that processes the text data so that the text data can be quickly and visually identified with respect to other characters, and a display unit that displays a processing result by the display processing unit. It is.

【００１２】また、第２の発明による音声認識装置は、
上記第１の発明による音声認識装置において、上記誤認
識文字推定部が認識決定された文字の尤度を所定の閾値
と比較して、この閾値以下の尤度となる文字を誤認識確
率が高い文字として推定するものである。Further, the speech recognition apparatus according to the second invention is
In the speech recognition apparatus according to the first aspect, the misrecognized character estimation unit compares the likelihood of the recognized character with a predetermined threshold value, and determines a character having a likelihood equal to or less than the threshold value with a high false recognition probability. It is estimated as a character.

【００１３】さらに、第３の発明による音声認識装置
は、上記第１の発明による音声認識装置において、上記
誤認識文字推定部が認識決定された文字の尤度と認識決
定される際に次候補であった文字の尤度との差を求め
て、この差が所定の閾値以下である文字を誤認識確率が
高い文字として推定するものである。Further, the speech recognition apparatus according to a third aspect of the present invention is the speech recognition apparatus according to the first aspect, wherein the erroneously recognized character estimating section recognizes the likelihood of the recognized character and determines the next candidate when the recognition is determined. Then, a difference between the likelihood of the character and the character whose difference is equal to or less than a predetermined threshold value is estimated as a character having a high probability of erroneous recognition.

【００１４】従って、第１の発明による音声認識装置
は、音声認識処理部が入力した音声データを音声認識し
てテキストデータに変換し、誤認識文字推定部がこの音
声認識処理部による音声認識処理の過程において生じる
各文字の尤度を利用して誤認識の確率が高い文字を推定
し、表示処理部がこの誤認識文字推定部により推定され
た誤認識文字をそれ以外の文字に対して視覚的に速やか
に識別可能となるように上記テキストデータを表示させ
るべく処理し、表示部がこの表示処理部による処理結果
を表示する。Therefore, the speech recognition apparatus according to the first invention recognizes the speech data inputted by the speech recognition processing section and converts the speech data into text data, and the erroneously recognized character estimation section performs the speech recognition processing by the speech recognition processing section. Using the likelihood of each character generated in the process of estimating a character with a high probability of misrecognition, the display processing unit visually recognizes the misrecognized character estimated by the misrecognized character estimation unit with respect to other characters. Processing is performed to display the text data so that the text data can be identified as quickly as possible, and the display unit displays the processing result by the display processing unit.

【００１５】また、第２の発明による音声認識装置は、
上記誤認識文字推定部が、認識決定された文字の尤度を
所定の閾値と比較して、この閾値以下の尤度となる文字
を誤認識確率が高い文字として推定する。[0015] Further, the speech recognition apparatus according to the second aspect of the present invention comprises:
The misrecognized character estimation unit compares the likelihood of the character determined to be recognized with a predetermined threshold, and estimates a character having a likelihood equal to or less than the threshold as a character having a high misrecognition probability.

【００１６】さらに、第３の発明による音声認識装置
は、上記誤認識文字推定部が、認識決定された文字の尤
度と認識決定される際に次候補であった文字の尤度との
差を求めて、この差が所定の閾値以下である文字を誤認
識確率が高い文字として推定する。Further, in the speech recognition apparatus according to the third invention, the erroneously recognized character estimating section may calculate a difference between the likelihood of the character determined to be recognized and the likelihood of the character which was the next candidate when the recognition was determined. Is calculated, and a character whose difference is equal to or smaller than a predetermined threshold is estimated as a character having a high probability of erroneous recognition.

【００１７】[0017]

【発明の実施の形態】以下、図面を参照して本発明の実
施の形態を説明する。図１から図４は本発明の一実施形
態を示したものであり、図１は音声認識装置の構成を示
すブロック図、図２は音声認識処理部における処理を示
すフローチャート、図３は誤認識文字推定部における処
理を示すフローチャート、図４は表示処理部における処
理を示すフローチャートである。Embodiments of the present invention will be described below with reference to the drawings. 1 to 4 show an embodiment of the present invention. FIG. 1 is a block diagram showing a configuration of a speech recognition device, FIG. 2 is a flowchart showing processing in a speech recognition processing unit, and FIG. FIG. 4 is a flowchart showing processing in the character estimating unit, and FIG. 4 is a flowchart showing processing in the display processing unit.

【００１８】この音声認識処理装置は、例えば音声認識
処理プログラムにより処理を行うパーソナルコンピュー
タとして構成されていて、認識して文書化する対象とな
る音声または音声データを入力してデジタルデータ化す
る音声データ入力部１と、この音声データ入力部１の出
力を受けて音声認識処理を行いテキストデータに変換す
る音声認識処理部２と、この音声認識処理部２が音声認
識を行う際に参照する言語モデル等が格納されている辞
書３と、上記音声認識処理部２による音声認識処理の過
程において生じる各文字の尤度を利用して誤認識の確率
が高い文字を推定する誤認識文字推定部４と、この誤認
識文字推定部４により推定を行う際の閾値を設定するた
めのパラメータ入力部５と、上記誤認識文字推定部４の
作業用メモリとなるメモリ９と、上記音声認識処理部２
と誤認識文字推定部４の処理結果を受けて、誤認識文字
と推定された文字をそれ以外の文字に対して視覚的に速
やかに識別可能となるように上記テキストデータを表示
させるべく処理する表示処理部６と、この表示処理部６
の作業用メモリとなるメモリ１０と、上記表示処理部６
の出力を受けて音声認識したテキストデータを表示する
表示部８と、この表示部８の表示を見て文書の訂正を行
う際に使用するものでありその訂正入力が上記表示処理
部６に出力される表示操作入力部７とを有して構成され
ている。The speech recognition processing apparatus is configured as a personal computer that performs processing by a speech recognition processing program, for example, and inputs speech or speech data to be recognized and documented and converts the speech data into digital data. An input unit 1, a voice recognition processing unit 2 that receives an output of the voice data input unit 1 and performs voice recognition processing to convert the data to text data, and a language model referred to when the voice recognition processing unit 2 performs voice recognition And a misrecognized character estimating unit 4 for estimating a character having a high probability of misrecognition using the likelihood of each character generated in the process of speech recognition processing by the speech recognition processing unit 2. A parameter input unit 5 for setting a threshold for estimation by the erroneously recognized character estimating unit 4 and a working memory of the erroneously recognized character estimating unit 4. A memory 9, the voice recognition processing section 2
In response to the processing result of the erroneously recognized character estimating unit 4, processing is performed to display the text data so that the character estimated as the erroneously recognized character can be quickly and visually identified from other characters. A display processing unit 6;
A memory 10 serving as a work memory for the display processing unit 6
And a display unit 8 for displaying text data that has been subjected to speech recognition in response to the output of the display unit 8. The display unit 8 is used to correct a document while viewing the display on the display unit 8. The correction input is output to the display processing unit 6. And a display operation input unit 7 to be operated.

【００１９】上記音声データ入力部１は、例えば音声を
電気信号に変換するマイクロホンとこのマイクロホンの
入力信号を離散化してデジタルデータに変換するＡ／Ｄ
変換器とを有して構成されており、他の例としては、音
声データが記録された記録媒体を装着自在とする入力装
置等で構成されていても良い。The audio data input unit 1 includes, for example, a microphone for converting an audio signal into an electric signal and an A / D for discretizing an input signal of the microphone and converting the input signal into digital data.
It may be configured to include a converter, and as another example, it may be configured to include an input device or the like that allows a recording medium on which audio data is recorded to be freely attached.

【００２０】上記辞書３は、学習用音声データベースに
基づいたモデル音声の特徴量の標準パターンと、学習用
テキストデータベースに基づいた文字の生起順序に関す
る統計的言語モデル（読みのデータ付）とが記憶されて
いるものであり、学習機能を備えているために例えばハ
ードディスク等の読み書き可能な記録媒体に格納されて
いる。The dictionary 3 stores a standard pattern of model speech feature amounts based on a learning speech database and a statistical language model (with reading data) on the order of occurrence of characters based on a learning text database. Since it has a learning function, it is stored in a readable and writable recording medium such as a hard disk.

【００２１】上記音声認識処理部２、誤認識文字推定部
４、表示処理部６は、例えばＣＰＵ等により構成されて
いて、所定の処理プログラムをＣＰＵに順次処理させて
なる。The speech recognition processing section 2, the misrecognized character estimation section 4, and the display processing section 6 are constituted by, for example, a CPU or the like, and sequentially execute predetermined processing programs by the CPU.

【００２２】上記パラメータ入力部５は、上記誤認識文
字推定部４において後述する総合尤度の差の大小を判定
するときに使用する閾値を、使用者が自ら入力設定する
ための手段であり、例えばマウスやキーボード等で構成
されている。The parameter input section 5 is means for allowing a user to input and set a threshold value used when the erroneously recognized character estimating section 4 determines the magnitude of the difference between the total likelihoods described later, For example, it is configured by a mouse, a keyboard, and the like.

【００２３】上記表示操作入力部７は、表示部８を見て
文書の訂正を行う場合に使用する入力手段であり、例え
ばマウスやキーボード等で構成されていて、実際の装置
構成としては上記パラメータ入力部５と兼用するように
しても良い。The display operation input unit 7 is an input means used for correcting a document while looking at the display unit 8, and is composed of, for example, a mouse and a keyboard. The input unit 5 may also be used.

【００２４】上記表示部８は、例えばＣＲＴ表示装置や
ＬＣＤ表示装置等により構成されている。The display section 8 is composed of, for example, a CRT display device or an LCD display device.

【００２５】上記メモリ９とメモリ１０は、ＲＡＭ等に
より構成されていて、この図１においては、これらが別
体であるように図示されているが、これらを同一のメモ
リとしても構わない。The memory 9 and the memory 10 are composed of a RAM or the like. In FIG. 1, they are shown as being separate from each other, but they may be the same memory.

【００２６】次に、このような実施形態の作用について
説明する。Next, the operation of such an embodiment will be described.

【００２７】音声データは、上記音声データ入力部１に
おいて所定のサンプリングが行われることによりデジタ
ルデータ化されて、音声認識処理部２に送信される。The voice data is converted into digital data by performing predetermined sampling in the voice data input unit 1 and transmitted to the voice recognition processing unit 2.

【００２８】この音声認識処理部２において行われる音
声認識の処理について、図２を参照して説明する。The speech recognition process performed by the speech recognition processing section 2 will be described with reference to FIG.

【００２９】音声認識処理部２は、上記音声データ入力
部１から音声データを受信すると（ステップＳ１）、ま
ずこのデータを分析して、線形予測係数やケプストラム
係数等の音声の特徴量を抽出する（ステップＳ２）。When receiving the voice data from the voice data input unit 1 (step S1), the voice recognition processing unit 2 first analyzes the data and extracts voice feature amounts such as linear prediction coefficients and cepstrum coefficients. (Step S2).

【００３０】次に、上記辞書３に記憶されているモデル
音声の特徴量の標準パターンを参照して（ステップＳ
３）、入力音声に対して類似尤度の高いモデル音声を複
数選出する（ステップＳ４）。Next, reference is made to the standard pattern of the feature amount of the model voice stored in the dictionary 3 (step S).
3) Select a plurality of model voices having a high similar likelihood with the input voice (step S4).

【００３１】さらに、これら選出されたモデル音声に対
して、上記辞書３に記憶されている文字の生起順序に関
する統計的言語モデルを参照し（ステップＳ５）、各々
の文字候補が生起する確率を求める（ステップＳ６）。Further, with respect to the selected model voices, a statistical language model relating to the order of occurrence of characters stored in the dictionary 3 is referred to (step S5), and the probability of occurrence of each character candidate is obtained. (Step S6).

【００３２】そして、最終的にこれら類似尤度と生起確
率の積をその文字候補の総合的な尤度とし（ステップＳ
７）、この値が最も大きい文字候補を音声認識の結果と
して決定する（ステップＳ８）。Finally, the product of the similar likelihood and the occurrence probability is set as the overall likelihood of the character candidate (step S
7) A character candidate having the largest value is determined as a result of speech recognition (step S8).

【００３３】その後、この音声認識結果をテキストデー
タに変換して（ステップＳ９）、上記表示処理部６に送
信する（ステップＳ１０）。Thereafter, the speech recognition result is converted into text data (step S9) and transmitted to the display processing unit 6 (step S10).

【００３４】また、音声認識を行う際に上記ステップＳ
７において求めた総合尤度のデータを、上記誤認識文字
推定部４に送信する（ステップＳ１１）。When performing voice recognition, the above-described step S
The data of the total likelihood obtained in step 7 is transmitted to the erroneously recognized character estimation unit 4 (step S11).

【００３５】そして、音声データが終了したか否かを判
断し（ステップＳ１２）、終了していない場合には上記
ステップＳ１に戻って上述したような動作を音声データ
の入力が終了するまで繰り返して行い、音声データの入
力がなくなったところで音声認識の処理を終了する。Then, it is determined whether or not the voice data has been completed (step S12). If the voice data has not been completed, the flow returns to step S1 to repeat the above-described operation until the input of the voice data is completed. When the voice data is no longer input, the voice recognition process is terminated.

【００３６】次に、図３を参照して上記誤認識文字推定
部４における処理について説明する。Next, the processing in the erroneously recognized character estimating section 4 will be described with reference to FIG.

【００３７】誤認識文字推定部４は、上記音声認識処理
部２から総合尤度のデータを受信すると（ステップＳ２
１）、そのデータを上記メモリ９に書き込み（ステップ
Ｓ２２）、最終的に認識決定された文字の総合尤度と、
その文字の認識決定を行う際に次候補であった文字の総
合尤度との差を計算する（ステップＳ２３）。The erroneously recognized character estimation unit 4 receives the data of the total likelihood from the speech recognition processing unit 2 (step S2).
1) The data is written into the memory 9 (step S22), and the total likelihood of the finally determined character is calculated as follows:
The difference from the overall likelihood of the character that was the next candidate when making the character recognition decision is calculated (step S23).

【００３８】そして、この差が所定の閾値よりも小さい
か否かを判断し（ステップＳ２４）、小さい場合には、
誤認識の確率が高い文字であると推定して、その文字の
位置データを上記表示処理部６に送信する（ステップＳ
２５）。Then, it is determined whether or not this difference is smaller than a predetermined threshold (step S24).
It is estimated that the character has a high probability of erroneous recognition, and the position data of the character is transmitted to the display processing unit 6 (step S).
25).

【００３９】このステップＳ２５が終了するか、または
上記ステップＳ２４において上記差が所定の閾値以上で
ある場合には、この誤認識文字推定部４における処理を
終了する。If step S25 is completed or if the difference is equal to or larger than the predetermined threshold value in step S24, the processing in the erroneously recognized character estimating section 4 is completed.

【００４０】続いて、図４を参照して上記表示処理部６
における処理について説明する。Subsequently, referring to FIG.
Will be described.

【００４１】表示処理部６は、上記音声認識処理部２か
ら入力音声の認識結果であるテキストデータを受信し
（ステップＳ３１）、さらに、上記誤認識文字推定部４
から推定された誤認識文字の位置データ等を受信すると
（ステップＳ３２）、これらのデータをメモリ１０に書
き込む（ステップＳ３３）。The display processing unit 6 receives the text data as the recognition result of the input voice from the voice recognition processing unit 2 (step S31).
When the position data and the like of the misrecognized character estimated from are received (step S32), these data are written into the memory 10 (step S33).

【００４２】そして、推定された誤認識文字の表示色
（あるいは推定された誤認識文字の背景色）をそれ以外
の文字の表示色（あるいはそれ以外の文字の背景色）と
異なる色に変更して、これらの表示色により後段で上記
表示部８に表示されるべく処理を行い（ステップＳ３
４）、この表示処理部６における処理を終了する。Then, the display color of the estimated misrecognized character (or the background color of the estimated misrecognized character) is changed to a color different from the display color of the other characters (or the background color of the other characters). Then, processing is performed so that these display colors are displayed on the display unit 8 at a subsequent stage (step S3).
4), the processing in the display processing unit 6 ends.

【００４３】このような処理を行った結果、表示部８に
おいて、表示される文書中の例えば数カ所に、他の文字
と異なる色により表示される文字が現れ、これが誤認識
文字と推定された文字となる。As a result of performing such processing, a character displayed in a different color from other characters appears in, for example, several places in the displayed document on the display unit 8, and this is a character that is assumed to be a misrecognized character. Becomes

【００４４】これにより使用者は、異なる表示色により
誤認識文字と推定された文字を容易に確認することがで
き、その文字が実際に誤りであるか否かを自ら判断し
て、誤りである場合には、上記表示操作入力部７から正
しい文字を入力し直すことによって訂正を行う。Thus, the user can easily confirm a character presumed to be a misrecognized character by a different display color, and determines whether or not the character is actually a mistake, and determines that the character is a mistake. In this case, the correction is made by re-inputting a correct character from the display operation input unit 7.

【００４５】また、上記表示部８に表示されている文書
のほとんどが誤認識文字と推定された色の異なる文字で
あったり、あるいは誤認識文字と推定された色の異なる
文字が全く存在しないような場合には、閾値の設定が適
切でない可能性があるために、使用者が、上記パラメー
タ入力部５によって閾値の値を再設定する。In addition, most of the documents displayed on the display unit 8 are characters having different colors which are presumed to be erroneously recognized characters, or characters having different colors which are presumed to be erroneously recognized characters do not exist at all. In such a case, there is a possibility that the setting of the threshold value is not appropriate. Therefore, the user resets the threshold value using the parameter input unit 5.

【００４６】こうして数回試行錯誤して適切な閾値を設
定することにより、高い確率で誤認識文字を推定して視
覚的に速やかにその文字の箇所を探し出すことができる
ようになるために、効率良く文書を訂正することが可能
となる。By setting an appropriate threshold value by performing trial and error several times in this way, it is possible to estimate a misrecognized character with a high probability and to quickly find the position of the character visually. The document can be corrected well.

【００４７】なお、上述した誤認識文字推定部４におい
ては、認識決定された文字の総合尤度と次候補であった
文字の総合尤度との差に対して閾値を設けたが、これに
限るものではなく、例えば認識決定された文字の総合尤
度そのものに対して閾値を設けて、その閾値以下の文字
を誤認識文字であると推定するようにしても良いし、あ
るいは、これらの両方を行って推定するようにしても良
い。In the erroneously recognized character estimating unit 4 described above, a threshold value is provided for the difference between the total likelihood of a character determined to be recognized and the character of the next candidate character. The present invention is not limited to this. For example, a threshold value may be provided for the overall likelihood of a character determined to be recognized, and a character less than the threshold value may be estimated as a misrecognized character, or both of them may be used. May be performed for the estimation.

【００４８】また、上述では誤認識文字推定部４におい
て使用する閾値を、上記パラメータ入力部５により使用
者が自ら設定するようにしているが、該誤認識文字推定
部４において予め設定するようにしても良い。すなわ
ち、閾値として初期値を用意して、使用者による設定が
行われていないときにはこの初期値を使用するようにし
ても良い。さらには、音声認識装置自体により、誤認識
と推定された文字が文書全体中に発現する度合いを判断
して、適宜のステップにより閾値を自動的に変更し、最
適となる閾値を自動的に発見するような構成としても構
わない。In the above description, the threshold value used in the erroneously recognized character estimation unit 4 is set by the user himself through the parameter input unit 5. May be. That is, an initial value may be prepared as the threshold value, and the initial value may be used when the setting is not performed by the user. Furthermore, the speech recognition device itself determines the degree of occurrence of the character that is estimated to be erroneously recognized in the entire document, automatically changes the threshold value by appropriate steps, and automatically finds the optimum threshold value. The configuration may be such that

【００４９】そして、上述では推定された誤認識文字
を、それ以外の文字に対して視覚的に速やかに識別可能
となるようにするために色を変更しているが、もちろん
これに限定されるものではなく、形状の変更、つまり、
例えば書体の変更、文字サイズの変更、該当個所に下線
を付する、などの手段を用いても構わないことはいうま
でもない。In the above description, the color is changed so that the estimated misrecognized character can be quickly and visually distinguished from other characters. However, the present invention is not limited to this. Not a thing, but a shape change,
It goes without saying that, for example, means for changing the typeface, changing the character size, and underlining the corresponding portion may be used.

【００５０】さらに、推定された誤認識文字の総合尤度
を複数の段階に細分化し、その段階に応じて配色や濃淡
等を変化させて明示することにより、誤認識と推定され
る文字であっても、より誤認識である確率が高いもの
と、誤認識である確率が比較的低いものとを区別するこ
とができるようにしても良い。Further, the total likelihood of the estimated misrecognized character is subdivided into a plurality of stages, and the color scheme, shading, and the like are changed according to the stage to specify the overall likelihood. However, it may be possible to distinguish between those with a higher probability of misrecognition and those with a relatively low probability of misrecognition.

【００５１】このような実施形態の音声認識装置によれ
ば、音声認識の際に生じる尤度から誤認識の確率が高い
文字を推定して、推定された誤認識文字を他の文字と異
なる表示方法により画面に表示しているために、使用者
は、文書を確認する際に、視覚的に速やかに誤認識文字
を探し出して確認することができ、効率良く文書を訂正
することが可能となる。According to the speech recognition apparatus of such an embodiment, a character having a high probability of misrecognition is estimated from the likelihood generated during speech recognition, and the estimated misrecognized character is displayed differently from other characters. Since the method is displayed on the screen by the method, when checking the document, the user can quickly find and confirm the erroneously recognized character visually, and can efficiently correct the document. .

【００５２】なお、本発明は上述した各実施形態に限定
されるものではなく、発明の主旨を逸脱しない範囲内に
おいて種々の変形や応用が可能であることは勿論であ
る。It should be noted that the present invention is not limited to the above-described embodiments, and it is needless to say that various modifications and applications can be made without departing from the gist of the invention.

【００５３】［付記］以上詳述したような本発明の上記
実施形態によれば、以下のごとき構成を得ることができ
る。[Appendix] According to the above-described embodiment of the present invention as described in detail above, the following configuration can be obtained.

【００５４】（１）入力した音声データを音声認識し
てテキストデータに変換する音声認識処理部と、この音
声認識処理部による音声認識処理の過程において生じる
各文字の尤度を利用して誤認識の確率が高い文字を推定
する誤認識文字推定部と、この誤認識文字推定部により
推定された誤認識文字を、それ以外の文字に対して視覚
的に速やかに識別可能となるように、上記テキストデー
タを表示させるべく処理する表示処理部と、この表示処
理部による処理結果を表示する表示部と、を具備したこ
とを特徴とする音声認識装置。(1) Speech recognition processing unit for recognizing input speech data and converting it to text data, and erroneous recognition using the likelihood of each character generated in the course of speech recognition processing by the speech recognition processing unit A misrecognized character estimating unit that estimates a character having a high probability of, and the misrecognized character estimated by the misrecognized character estimating unit, so that the other characters can be quickly and visually identified. A speech recognition device comprising: a display processing unit that processes text data to be displayed; and a display unit that displays a processing result of the display processing unit.

【００５５】（２）上記誤認識文字推定部は、認識決
定された文字の尤度を所定の閾値と比較して、この閾値
以下の尤度となる文字を誤認識確率が高い文字として推
定するものであることを特徴とする付記（１）に記載の
音声認識装置。(2) The misrecognized character estimating section compares the likelihood of a character determined to be recognized with a predetermined threshold value, and estimates a character having a likelihood equal to or smaller than the threshold value as a character having a high probability of misrecognition. The speech recognition device according to supplementary note (1), wherein the speech recognition device is a speech recognition device.

【００５６】（３）上記誤認識文字推定部は、認識決
定された文字の尤度と認識決定される際に次候補であっ
た文字の尤度との差を求めて、この差が所定の閾値以下
である文字を誤認識確率が高い文字として推定するもの
であることを特徴とする付記（１）に記載の音声認識装
置。(3) The erroneously recognized character estimating section obtains a difference between the likelihood of the character determined to be recognized and the likelihood of the character which is the next candidate when the recognition is determined, and determines this difference as a predetermined value. The speech recognition apparatus according to (1), wherein a character having a threshold value or less is estimated as a character having a high misrecognition probability.

【００５７】（４）上記閾値を設定するためのパラメ
ータ入力部をさらに具備したことを特徴とする付記
（２）または付記（３）に記載の音声認識装置。(4) The speech recognition apparatus according to the additional note (2) or (3), further comprising a parameter input section for setting the threshold value.

【００５８】（５）上記表示処理部は、上記推定され
た誤認識文字を、それ以外の文字に対して色を変更する
ことにより、視覚的に速やかに識別可能とするものであ
ることを特徴とする付記（１）に記載の音声認識装置。(5) The display processing section is characterized in that the estimated misrecognized character can be visually and quickly identified by changing the color of the other characters. (1).

【００５９】（６）上記表示処理部は、さらに、上記
推定された誤認識文字が誤認識である確率に応じて、色
を変化させるものであることを特徴とする付記（５）に
記載の音声認識装置。(6) The display processing section further changes the color according to the probability that the estimated misrecognized character is misrecognized. Voice recognition device.

【００６０】（７）上記表示処理部は、上記推定され
た誤認識文字を、それ以外の文字に対して形状を変更す
ることにより、視覚的に速やかに識別可能とするもので
あることを特徴とする付記（１）に記載の音声認識装
置。(7) The display processing section is characterized in that the estimated misrecognized character can be visually and quickly identified by changing the shape of other estimated characters. (1).

【００６１】従って、付記（１）に記載の発明によれ
ば、誤認識文字がそれ以外の文字に対して視覚的に速や
かに識別可能となるように表示されるために、効率的に
誤認識文字を探し出すことができる。Therefore, according to the invention described in the appendix (1), the erroneously recognized character is displayed so as to be visually identifiable quickly with respect to other characters, so that the erroneously recognized character is efficiently displayed. You can find characters.

【００６２】付記（２）に記載の発明によれば、認識決
定された文字の内の尤度が所定の閾値以下となる文字を
誤認識確率が高い文字として推定することにより、付記
（１）に記載の発明と同様の効果を奏することができ
る。According to the invention described in the supplementary note (2), a character whose likelihood is equal to or less than a predetermined threshold value among the characters determined to be recognized is estimated as a character having a high misrecognition probability. The same effect as the invention described in (1) can be obtained.

【００６３】付記（３）に記載の発明によれば、認識決
定された文字の尤度と認識決定される際に次候補であっ
た文字の尤度との差が所定の閾値以下である文字を誤認
識確率が高い文字として推定することにより、付記
（１）に記載の発明と同様の効果を奏することができ
る。According to the invention described in the appendix (3), a character whose difference between the likelihood of the character determined to be recognized and the likelihood of the character which is the next candidate at the time of recognition determined is equal to or smaller than a predetermined threshold value. Is estimated as a character having a high misrecognition probability, the same effect as that of the invention described in Appendix (1) can be obtained.

【００６４】付記（４）に記載の発明によれば、付記
（２）または付記（３）に記載の発明と同様の効果を奏
するとともに、パラメータ入力部を備えたために、使用
者が閾値を設定入力することが可能となって、最適な閾
値を選択することができる。According to the invention described in Supplementary Note (4), the same effects as those of the invention described in Supplementary Note (2) or (3) can be obtained, and since the parameter input unit is provided, the user sets the threshold value. It becomes possible to input and an optimal threshold can be selected.

【００６５】付記（５）に記載の発明によれば、付記
（１）に記載の発明と同様の効果を奏するとともに、推
定された誤認識文字の色をそれ以外の文字に対して変更
することにより、視覚的な識別が容易となる。According to the invention described in Supplementary Note (5), the same effect as the invention described in Supplementary Note (1) can be obtained, and the color of the estimated misrecognized character can be changed for other characters. This facilitates visual identification.

【００６６】付記（６）に記載の発明によれば、付記
（５）に記載の発明と同様の効果を奏するとともに、推
定された誤認識文字が誤認識である確率に応じて色を変
化させるために、誤認識であることの確からしさを把握
することができる。According to the invention described in the supplementary note (6), the same effect as the invention described in the supplementary note (5) can be obtained, and the color is changed according to the probability that the estimated erroneously recognized character is erroneously recognized. Therefore, it is possible to grasp the certainty that the recognition is erroneous.

【００６７】付記（７）に記載の発明によれば、付記
（１）に記載の発明と同様の効果を奏するとともに、推
定された誤認識文字の形状をそれ以外の文字に対して変
更することにより、視覚的な識別が容易となる。According to the invention described in Supplementary Note (7), the same effect as that of the invention described in Supplementary Note (1) is obtained, and the shape of the erroneously recognized character is changed with respect to other characters. This facilitates visual identification.

【００６８】[0068]

【発明の効果】以上説明したように請求項１による本発
明の音声認識装置によれば、誤認識文字がそれ以外の文
字に対して視覚的に速やかに識別可能となるように表示
されるために、効率的に誤認識文字を探し出すことがで
きる。As described above, according to the speech recognition apparatus of the first aspect of the present invention, an erroneously recognized character is displayed so that it can be visually identified quickly from other characters. In addition, it is possible to efficiently search for misrecognized characters.

【００６９】また、請求項２による本発明の音声認識装
置によれば、認識決定された文字の内の尤度が所定の閾
値以下となる文字を誤認識確率が高い文字として推定す
ることにより、請求項１に記載の発明と同様の効果を奏
することができる。Further, according to the speech recognition apparatus of the present invention, a character whose likelihood is equal to or less than a predetermined threshold value among the characters determined to be recognized is estimated as a character having a high erroneous recognition probability. The same effects as those of the first aspect can be obtained.

【００７０】さらに、請求項３による本発明の音声認識
装置によれば、認識決定された文字の尤度と認識決定さ
れる際に次候補であった文字の尤度との差が所定の閾値
以下である文字を誤認識確率が高い文字として推定する
ことにより、請求項１に記載の発明と同様の効果を奏す
ることができる。Further, according to the speech recognition apparatus of the present invention, the difference between the likelihood of the character determined and recognized and the likelihood of the character that was the next candidate when the recognition was determined is determined by the predetermined threshold value. By estimating the following characters as characters having a high misrecognition probability, the same effects as those of the first aspect can be obtained.

[Brief description of the drawings]

【図１】本発明の一実施形態の音声認識装置の構成を示
すブロック図。FIG. 1 is a block diagram showing a configuration of a speech recognition device according to an embodiment of the present invention.

【図２】上記実施形態の音声認識装置の音声認識処理部
における処理を示すフローチャート。FIG. 2 is a flowchart showing processing in a voice recognition processing unit of the voice recognition device of the embodiment.

【図３】上記実施形態の音声認識装置の誤認識文字推定
部における処理を示すフローチャート。FIG. 3 is a flowchart showing a process in a misrecognized character estimation unit of the speech recognition device of the embodiment.

【図４】上記実施形態の音声認識装置の表示処理部にお
ける処理を示すフローチャート。FIG. 4 is a flowchart showing processing in a display processing unit of the voice recognition device of the embodiment.

[Explanation of symbols]

１…音声データ入力部２…音声認識処理部３…辞書４…誤認識文字推定部５…パラメータ入力部６…表示処理部７…表示操作入力部８…表示部９，１０…メモリ DESCRIPTION OF SYMBOLS 1 ... Speech data input part 2 ... Speech recognition processing part 3 ... Dictionary 4 ... Error recognition character estimation part 5 ... Parameter input part 6 ... Display processing part 7 ... Display operation input part 8 ... Display part 9, 10 ... Memory

Claims

[Claims]

1. A speech recognition processing unit for speech recognition of input speech data to convert the speech data into text data, and an erroneous recognition using the likelihood of each character generated in a speech recognition process by the speech recognition processing unit. A misrecognized character estimator for estimating a character having a high probability, and a misrecognized character estimated by the misrecognized character estimator,
A display processing unit that processes the text data to be displayed so that other characters can be quickly and visually identified; and a display unit that displays the processing result of the display processing unit. A speech recognition device characterized by the above-mentioned.

2. The erroneously recognized character estimating unit compares a likelihood of a character determined to be recognized with a predetermined threshold, and estimates a character having a likelihood equal to or smaller than the threshold as a character having a high erroneous recognition probability. The speech recognition device according to claim 1, wherein

3. The erroneously recognized character estimating unit obtains a difference between the likelihood of the character determined to be recognized and the likelihood of the character that is the next candidate when the recognition is determined, and determines the difference as a predetermined threshold. The speech recognition apparatus according to claim 1, wherein the following characters are estimated as characters having a high misrecognition probability.