JP2000259387A

JP2000259387A - Voice recognition device, storage medium recording voice recognition processing program and voice recognition processing method

Info

Publication number: JP2000259387A
Application number: JP11063802A
Authority: JP
Inventors: Takashi Onishi; 孝史大西; Michio Nagai; 通夫永井
Original assignee: Olympus Optical Co Ltd
Current assignee: Olympus Corp
Priority date: 1999-03-10
Filing date: 1999-03-10
Publication date: 2000-09-22

Abstract

PROBLEM TO BE SOLVED: To provide a voice recognition device with which a progress condition of a voice recognition work is easily grasped even when voice recognition is performed for plural files at a time. SOLUTION: This voice recognition device is equipped with a storage medium 3 for recording a voice file having sound data, an operation member 11 which is capable of selecting plural voice files, a voice recognizing part 4 which continually convert the sound data of the selected voice files into text data, and a control part 4 which displays the progress a situation of text data conversion processing by the voice recognizing part 4A at every voice file.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声認識装置、音
声認識処理プログラムを記録した記録媒体、及び音声認
識処理方法に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition device, a recording medium storing a voice recognition processing program, and a voice recognition processing method.

【０００２】[0002]

【従来の技術】音声記録再生装置により記録された音声
データを受信してファイルごとに音声認識ソフトウェア
により音声認識してテキスト化する音声認識装置が提案
されている。このような音声認識装置において、音声認
識を行ないたい音声データファイルが複数存在する場合
には、音声認識ソフトウェアによりファイルを１つ選択
してそれを音声認識し、その後、次のファイルを選択し
て音声認識を行なうというように、ファイルごとに逐次
音声認識を行なうという方法をとっていた。このような
音声認識方法は全ファイルに対する音声認識が終了する
までに少なからず時間を要するので上記の作業を自動化
し、複数のファイルを連続して一度に音声認識すること
でテキスト化する方法が考えられている。2. Description of the Related Art There has been proposed a voice recognition apparatus which receives voice data recorded by a voice recording / reproducing apparatus, recognizes voice for each file by voice recognition software, and converts the file into text. In such a voice recognition device, when there are a plurality of voice data files for which voice recognition is to be performed, one file is selected by voice recognition software, the voice is recognized, and then the next file is selected. As in the case of speech recognition, a method of sequentially performing speech recognition for each file has been adopted. Since such a voice recognition method requires a considerable amount of time to complete voice recognition for all files, a method of automating the above operation and converting a plurality of files to text by continuously performing voice recognition at once is considered. Have been.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、複数の
ファイルに対する音声認識を自動化して行なった場合、
１つのファイルに対する音声認識と次のファイルに対す
る音声認識との切れ目がわからないので、現在どのファ
イルのどのへんまで認識作業が進んでいるのかといった
音声認識の進行状況の把握が難しいという問題があっ
た。However, when speech recognition for a plurality of files is performed automatically,
Since it is not possible to know the gap between the voice recognition for one file and the voice recognition for the next file, there is a problem that it is difficult to grasp the progress of voice recognition such as which file is currently being recognized.

【０００４】本発明はこのような課題に着目してなされ
たものであり、その目的とするところは、複数のファイ
ルを一度に音声認識する場合であっても、音声認識作業
の進行状況を容易に把握することができる音声認識装
置、音声認識処理プログラムを記録した記録媒体、及び
音声認識処理方法を提供することにある。The present invention has been made in view of such a problem, and an object of the present invention is to facilitate the progress of a voice recognition operation even when a plurality of files are voice-recognized at once. It is another object of the present invention to provide a voice recognition device, a recording medium storing a voice recognition processing program, and a voice recognition processing method.

【０００５】[0005]

【課題を解決するための手段】上記の目的を達成するた
めに、第１の発明に係る音声認識装置は、音声データを
有する音声ファイルを記録する記録媒体と、前記音声フ
ァイルを複数選択することが可能な選択手段と、前記選
択手段により選択された複数の音声ファイルの音声デー
タを連続してテキストデータに変換処理する音声認識手
段と、前記音声認識手段によるテキストデータ変換処理
の進行状況を音声ファイル毎に表示する表示手段とを具
備する。In order to achieve the above object, a voice recognition apparatus according to a first aspect of the present invention includes a recording medium for recording a voice file having voice data and a plurality of voice files. Selecting means capable of performing the following processing; a voice recognizing means for continuously converting voice data of a plurality of voice files selected by the selecting means into text data; and a voice indicating progress of the text data conversion processing by the voice recognizing means. Display means for displaying each file.

【０００６】また、第２の発明に係る音声認識装置は、
第１の発明において、前記表示手段は、前記選択手段に
よって選択された複数の音声ファイルのファイル名と、
当該ファイル名に係る音声ファイルに対するテキストデ
ータ変換処理が未処理か、処理中か、処理済かを示す進
行状況とを表示する。[0006] A speech recognition apparatus according to a second aspect of the present invention comprises:
In the first invention, the display means includes: file names of the plurality of audio files selected by the selection means;
A progress status indicating whether the text data conversion processing for the audio file corresponding to the file name has not been processed, is being processed, or has been processed is displayed.

【０００７】また、第３の発明に係る音声認識装置は、
第１の発明において、前記表示手段は、前記選択手段に
よって選択された複数の音声ファイルのファイル名を表
示し、テキストデータ変換処理が終了した音声ファイル
のファイル名の表示を消去する。[0007] A speech recognition apparatus according to a third aspect of the present invention comprises:
In the first invention, the display means displays the file names of the plurality of audio files selected by the selection means, and erases the display of the file names of the audio files for which the text data conversion processing has been completed.

【０００８】また、第４の発明に係る音声認識装置は、
第１乃至第３のいずれか１つの発明において、前記選択
手段により選択された複数の音声ファイルの全音声デー
タ量と、前記音声認識手段によるテキストデータ変換を
行なった分の音声ファイルの音声データ量と、前記全音
声データ量に対する前記音声データ量の割合とを演算す
る演算手段を更に有し、前記表示手段は、更に前記演算
手段の演算結果を表示する。[0008] A speech recognition apparatus according to a fourth invention is characterized in that:
In any one of the first to third inventions, the total audio data amount of the plurality of audio files selected by the selection unit and the audio data amount of the audio file corresponding to the text data converted by the audio recognition unit. And a calculating means for calculating a ratio of the audio data amount to the total audio data amount, and the display means further displays a calculation result of the calculating means.

【０００９】また、第５の発明は、コンピュータによっ
て音声認識処理をするための処理プログラムを記録した
記録媒体であって、前記処理プログラムは、前記コンピ
ュータによって、音声データを有する音声ファイルが記
録された記録媒体から複数の音声ファイルを読み込ま
せ、前記複数の音声ファイルを連続的に音声認識させて
テキストデータに変換させ、前記テキストデータ変換処
理の進行状況を音声ファイル毎に表示させる。According to a fifth aspect of the present invention, there is provided a recording medium storing a processing program for performing a voice recognition process by a computer, wherein the processing program stores a voice file having voice data by the computer. A plurality of audio files are read from a recording medium, the plurality of audio files are continuously recognized by speech, converted into text data, and the progress of the text data conversion process is displayed for each audio file.

【００１０】また、第６の発明は、第５の発明におい
て、前記音声認識プログラムは、更に、コンピュータ
に、テキストデータに変換しようとする複数の音声ファ
イルの全ファイルサイズまたは全音声データ量と、テキ
ストデータ変換終了分の音声ファイルのデータサイズま
たは音声データ量と、前記全ファイルサイズもしくは前
記全音声データ量に対する前記データサイズもしくは前
記音声データ量の割合とを演算させ、当該演算結果を更
に表示させる。In a sixth aspect based on the fifth aspect, the voice recognition program further comprises: a computer for causing a computer to convert the total file size or total voice data amount of a plurality of voice files to be converted into text data; The data size or audio data amount of the audio file corresponding to the end of the text data conversion and the ratio of the data size or the audio data amount to the total file size or the total audio data amount are calculated, and the calculation result is further displayed. .

【００１１】また、第７の発明に係る音声認識方法は、
音声データを有する音声ファイルが記録された記録媒体
から複数の音声ファイルを選択するステップと、選択さ
れた複数の音声ファイルを連続的にテキストデータに変
換するステップと、テキストデータ変換処理の進行状況
を音声ファイル毎に表示するステップとからなる。[0011] Further, a speech recognition method according to a seventh invention is characterized in that:
Selecting a plurality of audio files from a recording medium on which an audio file having audio data is recorded; continuously converting the selected plurality of audio files into text data; and Displaying for each audio file.

【００１２】[0012]

【発明の実施の形態】以下、図面を参照して本発明の実
施形態を詳細に説明する。図１は本実施形態に係る音声
認識装置の構成を示す図である。本実施形態の音声認識
装置はパーソナルコンピュータ（以下、パソコン）２で
構成されている。このパソコン２には、音声認識すべき
音声データを生成して当該パソコン２に送信する音声記
録再生装置１がケーブル１２を介して接続されている。
なお、音声記録再生装置１とパソコン２とは無線にてデ
ータ通信を行なうようにしてもよい。Embodiments of the present invention will be described below in detail with reference to the drawings. FIG. 1 is a diagram illustrating a configuration of a speech recognition device according to the present embodiment. The voice recognition device according to the present embodiment includes a personal computer (hereinafter, a personal computer) 2. The personal computer 2 is connected via a cable 12 to an audio recording / reproducing apparatus 1 for generating audio data to be subjected to audio recognition and transmitting the generated audio data to the personal computer 2.
Note that the audio recording / reproducing apparatus 1 and the personal computer 2 may perform wireless data communication.

【００１３】パソコン２は各部の制御を行なう制御部４
を備えており、この制御部４には、音声認識すべき音声
データファイルを記録するための記録媒体としてのハー
ドディスク（ＨＤ）３と、ＲＡＭ５と、選択手段として
のマウス等の操作部材１１が接続されるとともに、表示
制御装置９を介して表示装置（表示手段）１０が接続さ
れている。さらに、音声データを再生するための機能と
して、デジタルアナログ変換器（Ｄ／Ａ）８、増幅器
７、スピーカ６が接続されている。制御部４は選択され
た音声ファイル中の音声データを連続してテキストデー
タに変換処理することが可能な音声認識部（音声認識手
段）４Ａを備えている。The personal computer 2 has a control unit 4 for controlling each unit.
The control unit 4 is connected to a hard disk (HD) 3 as a recording medium for recording a voice data file to be voice-recognized, a RAM 5, and an operation member 11 such as a mouse as selection means. At the same time, a display device (display means) 10 is connected via a display control device 9. Further, a digital / analog converter (D / A) 8, an amplifier 7, and a speaker 6 are connected as functions for reproducing audio data. The control unit 4 includes a voice recognition unit (voice recognition unit) 4A that can continuously convert voice data in the selected voice file into text data.

【００１４】図２は音声認識部４Ａに格納された音声認
識ソフトウェアを起動したときに表示装置１０に表示さ
れる表示画面を示す図である。ファイル一覧表示部１０
３には複数のファイル１０４−１〜１０４−５が表示さ
れている。そのうち、１０４−１は反転表示されてお
り、操作者により選択されていることを示す。操作者に
より音声認識すべきファイルが選択された後、音声認識
ボタン１０２を操作（クリック）すると音声認識処理が
開始される。FIG. 2 is a diagram showing a display screen displayed on the display device 10 when the speech recognition software stored in the speech recognition section 4A is activated. File list display section 10
3, a plurality of files 104-1 to 104-5 are displayed. Among them, 104-1 is highlighted and indicates that the operator has selected it. After a file to be subjected to voice recognition is selected by the operator, the voice recognition button 102 is operated (clicked) to start voice recognition processing.

【００１５】図３は本発明の第１実施形態における、操
作者からの操作に対する制御部の動作を示すメインフロ
ーチャートである。まず、ステップＳ１の初期設定にお
いて種々のデフォルト値の設定を行なう。次のステップ
Ｓ２でファイルの選択動作が行われたかどうかを判断す
る。ＹＥＳの場合にはステップＳ５に進み選択されたフ
ァイルを反転表示する。ここでもし選択動作が行われた
ファイルが既に選択済みであった場合は、逆に選択を解
除し、表示を非反転表示に戻す。ステップＳ２の判断が
ＮＯ、あるいはステップＳ５の工程の後は、ステップＳ
３に進んで図２に表示されている各種のボタンのいずれ
かがオン（クリック）されたかどうかを判断し、ＹＥＳ
の場合にはステップＳ４に進んでオンされたボタンに対
応するコマンドを実行して、リターンする。また、ステ
ップＳ３の判断がＮＯの場合には直ちにステップＳ２に
リターンする。ステップＳ３では音声認識ボタン１０２
がクリックされたかどうかをも検出しており、音声認識
ボタン１０２がクリックされた場合にはステップＳ４に
おいて音声認識処理を行なう。FIG. 3 is a main flowchart showing the operation of the control unit in response to an operation by an operator in the first embodiment of the present invention. First, various default values are set in the initial setting in step S1. In the next step S2, it is determined whether or not a file selecting operation has been performed. If YES, the process proceeds to step S5 to highlight the selected file. If the selected file has already been selected, the selection is canceled and the display is returned to the non-reversed display. If the determination in step S2 is NO, or after step S5, step S2
3 to determine whether any of the various buttons displayed in FIG. 2 has been turned on (clicked), and YES
In the case of, the process proceeds to step S4 to execute the command corresponding to the turned on button, and then returns. If the determination in step S3 is NO, the process immediately returns to step S2. In step S3, the voice recognition button 102
Is also detected, and if the voice recognition button 102 is clicked, a voice recognition process is performed in step S4.

【００１６】図４は第１実施形態における音声認識処理
の詳細を説明するためのフローチャートである。ここで
は複数のファイルを連続して音声認識する場合を想定す
る。また、音声認識は高音質モードを選択した場合にの
み可能とする。まず、ステップＳ１０において認識すべ
き音声ファイルが選択されたかどうかを判断し、ＮＯの
場合にはステップＳ１１に進んで、例えば、“ファイル
が選択されていません。ファイルを選択してもう一度や
り直してください。”などのメッセージをダイアログボ
ックスにより表示する。FIG. 4 is a flowchart for explaining the details of the voice recognition processing in the first embodiment. Here, it is assumed that a plurality of files are continuously subjected to voice recognition. The voice recognition is enabled only when the high sound quality mode is selected. First, it is determined whether or not a voice file to be recognized has been selected in step S10. If NO, the process proceeds to step S11, for example, "No file has been selected. Select a file and try again. A message such as "." Is displayed in a dialog box.

【００１７】また、ステップＳ１０の判断がＹＥＳの場
合にはステップＳ１２に進んで音声認識が可能かどうか
を判断する。ここでＮＯの場合にはステップＳ３６に進
んで、“選択されたファイルのうち、＊．ｄｓｓ（＊は
ｄｓｓという拡張子をもつ任意のファイルを意味する）
は低音質モードのため音声認識できません。その他のフ
ァイルのみ音声認識を行ないますか？”というメッセー
ジを表示する。次のステップＳ３７で処理を続行するか
どうかを判断し、ＹＥＳの場合にはステップＳ３８に進
んで音声認識が可能なファイルのみを自動的に選択す
る。また、ステップＳ３７の判断がＮＯの場合にはリタ
ーンする。If the determination in step S10 is YES, the process advances to step S12 to determine whether speech recognition is possible. In the case of NO here, the process proceeds to step S36, where "* .dss (* means an arbitrary file having an extension of dss) among the selected files"
Cannot be recognized because of low sound quality mode. Do you perform voice recognition only for other files? Message is displayed. In the next step S37, it is determined whether or not to continue the processing. If YES, the process proceeds to step S38 to automatically select only a file that can be subjected to voice recognition. If the determination is NO, the process returns.

【００１８】ステップＳ１２の判断がＹＥＳの場合、あ
るいはステップＳ３８の工程の後は、ステップＳ１３に
進んでテキスト化したファイルの保存先（特定のフォル
ダ）を指定する。ここで操作者が指定しない場合にはデ
フォルト値を設定する。次にステップＳ１４に進んで、
音声認識を行なうときに選択されたファイルのリストを
処理状態とともに表示する。If the determination in step S12 is YES, or after the process in step S38, the flow advances to step S13 to specify the storage destination (specific folder) of the text file. Here, if the operator does not specify, a default value is set. Next, proceeding to step S14,
A list of files selected when performing voice recognition is displayed together with the processing status.

【００１９】図５はこのような音声認識処理の状況を示
す表示画面である。図に示すように、ここでは選択され
たファイルの音声認識をまだ開始していないので、すべ
てのファイル（Welcome.dss、Welcome2.dss、Welcome3.
dss、Welcome4.dss、Welcome5.dss)が“待機中”と表示
される。画面の右側には確認（ＯＫ）ボタン１１０、ポ
ーズボタン（Pause)１１１、デリート(Delete)ボタン１
１２、キャンセル(Cancel)ボタン１１３が表示されてい
る。FIG. 5 is a display screen showing the status of such voice recognition processing. As shown in the figure, since the voice recognition of the selected file has not started yet, all files (Welcome.dss, Welcome2.dss, Welcome3.
dss, Welcome4.dss, Welcome5.dss) are displayed as "Waiting". On the right side of the screen, a confirmation (OK) button 110, a pause button (Pause) 111, and a delete (Delete) button 1
12. A Cancel button 113 is displayed.

【００２０】ステップＳ１４でファイルリストを表示し
た後は、ステップＳ１５に進んで、まず選択されている
ファイルWelcome.dss に対する音声認識処理を開始す
る。次にステップＳ１５に進んで音声認識中のファイル
（Welcome.dss ）の表示を“処理中”に変更する。After the file list is displayed in step S14, the flow advances to step S15 to start speech recognition processing for the selected file Welcome.dss. Next, the process proceeds to step S15, and the display of the file (Welcome.dss) under voice recognition is changed to "processing".

【００２１】次に図６のステップＳ１７に進んで音声認
識処理画面上でファイルの選択動作が行われたかどうか
を判断する。ＹＥＳの場合にはステップＳ１８に進み選
択されたファイルを反転表示する。ここでもし選択動作
が行われたファイルが既に選択済みであった場合は逆に
選択を解除し、表示を非反転表示に戻す。ステップＳ１
７の判断がＮＯの場合、あるいはステップＳ１８の工程
の後は、ステップＳ１９に進んで、ポーズボタン１１
１、デリートボタン１１２、キャンセルボタン１１３、
確認（Ｏ．Ｋ．）ボタン１１０のいずれかがオンされた
かどうかを判断し、例えばポーズボタン１１１がオンさ
れた場合にはステップＳ２０に進んで音声認識処理を一
時停止する。この状態でポーズボタンが再度押されるま
で待機し、再度押されたときにステップＳ２１の判断が
ＹＥＳとなって次のステップＳ２３において音声認識処
理を再開し、次にステップＳ３０に移行する。Next, the process proceeds to step S17 in FIG. 6 to determine whether or not a file selection operation has been performed on the voice recognition processing screen. If YES, the process proceeds to step S18 to highlight the selected file. If the selected file has already been selected, the selection is canceled and the display is returned to the non-reversed display. Step S1
7 is NO, or after the process of step S18, the process proceeds to step S19, where the pause button 11
1, delete button 112, cancel button 113,
It is determined whether any of the confirmation (OK) buttons 110 has been turned on. If, for example, the pause button 111 has been turned on, the process proceeds to step S20 to temporarily stop the voice recognition processing. In this state, the system waits until the pause button is pressed again. When the pause button is pressed again, the determination in step S21 becomes YES, the speech recognition process is restarted in the next step S23, and the process proceeds to step S30.

【００２２】また、デリートボタン１１２が押された場
合にはステップＳ２４に進んで選択されているファイル
をリストから削除して、ステップＳ３０に移行する。音
声認識中のファイルが選択されている場合にも、処理を
中止した後リストから削除する。If the delete button 112 is pressed, the flow advances to step S24 to delete the selected file from the list, and shifts to step S30. Even when a file for which voice recognition is being performed is selected, the process is stopped and then deleted from the list.

【００２３】また、キャンセルボタン１１３が押された
場合にはステップＳ２５に進んで現在音声認識中のファ
イルの処理を中止し、全ファイルをリストから削除して
音声認識処理を終了する。If the cancel button 113 is pressed, the flow advances to step S25 to stop the processing of the file currently under voice recognition, delete all files from the list, and end the voice recognition processing.

【００２４】また、確認（Ｏ．Ｋ．）ボタン１１０が押
された場合にはステップＳ２６に進んで現在の画面を最
小化した後、ステップＳ３０に移行する。If the confirmation (OK) button 110 has been pressed, the flow advances to step S26 to minimize the current screen, and then to step S30.

【００２５】また、ステップＳ１９においてポーズボタ
ン１１１、デリートボタン１１２、キャンセルボタン１
１３、確認（Ｏ．Ｋ．）ボタン１１０のどれもがオンさ
れていないと判断された場合には直ちにステップＳ３０
に移行する。In step S19, a pause button 111, a delete button 112, a cancel button 1
13. If it is determined that none of the confirmation (OK) buttons 110 have been turned on, step S30 is immediately performed.
Move to

【００２６】なお、音声認識を行なっている間は別ウイ
ンドウに音声認識処理されたテキストを表示することも
できる。図７はこのようなウインドウの一例を示してい
る。操作者はこの画面を見ながら、もし音声認識率が悪
く良好なテキストではないと判断した場合にはデリート
ボタン１１２を押すことにより処理中のファイルのテキ
スト変換を停止して次のファイルの音声認識処理に移行
することも可能である。While the speech recognition is being performed, the text subjected to the speech recognition processing can be displayed in another window. FIG. 7 shows an example of such a window. While looking at this screen, if the operator determines that the speech recognition rate is not good and the text is not good, the operator presses the delete button 112 to stop the text conversion of the file being processed and perform speech recognition of the next file. It is also possible to shift to processing.

【００２７】ステップＳ３０では１ファイルの音声認識
処理が終了かどうかを判断し、ＮＯの場合にはステップ
Ｓ１６に戻る。また、ステップＳ３０の判断がＹＥＳの
場合には、ステップＳ３１に進んでステップＳ１３で指
定したフォルダにテキストファイルの拡張子を．ｖｐｓ
にして保存する。In step S30, it is determined whether or not the voice recognition processing for one file has been completed. If NO, the process returns to step S16. If the determination in step S30 is YES, the process proceeds to step S31, where the extension of the text file is added to the folder specified in step S13. vps
And save.

【００２８】次にステップＳ３２に進んで音声認識済み
のファイル名を音声認識処理画面上から消去する。図８
はこの時点における画面表示を示しており、図からわか
るように、処理が完了したファイル（ここでは、Welcom
e.dss、Welcome2.dss ）は消去され、処理が完了してい
ないファイル（ここでは、Welcome3.dss、Welcome4.ds
s、Welcome5.dss)のみが表示されている。Then, the flow advances to step S32 to delete the file name for which speech recognition has been performed from the speech recognition processing screen. FIG.
Indicates the screen display at this point. As can be seen from the figure, the processed file (here, Welcom
e.dss, Welcome2.dss) are erased and unprocessed files (Welcome3.dss, Welcome4.ds here)
s, Welcome5.dss) are displayed.

【００２９】次に選択された次のファイルに移動し（ス
テップＳ３３）、全ファイルの音声認識処理が終了かど
うかを判断して（ステップＳ３４）、ＮＯの場合にはス
テップＳ１５に戻って移動したファイルの音声認識処理
を行なう。このようにして、すべてのファイルについて
の音声認識処理が終了したときにステップＳ３４の判断
がＹＥＳとなって次のステップＳ３５に進んで図９に示
すような終了表示を行ない、リターンする。Next, the process moves to the next file selected (step S33), and it is determined whether or not the voice recognition processing for all files has been completed (step S34). If NO, the process returns to step S15 and moves. Perform voice recognition processing of the file. In this way, when the voice recognition processing for all the files is completed, the determination in step S34 becomes YES, the process proceeds to the next step S35, an end display as shown in FIG. 9 is performed, and the process returns.

【００３０】上記した第１実施形態によれば、複数の音
声データファイルのファイル名と、当該ファイル名をも
つ音声データファイルの変換処理が未処理か、処理中
か、あるいは処理済かの表示とを対にして音声認識処理
の進行状況を知らしめるようにしたので、複数のファイ
ルを一度に音声認識する場合であっても、音声認識作業
の進行状況を容易に把握することができる。さらに、音
声認識処理が終了した音声データファイルの表示を消去
することにより、未処理の音声ファイルがどれぐらいあ
るかを容易に把握することができる。According to the above-described first embodiment, the file names of a plurality of audio data files and an indication of whether the conversion processing of the audio data files having the file names has not been processed, is being processed, or has been processed. Are notified as a pair, so that the progress of the voice recognition work can be easily grasped even when a plurality of files are voice-recognized at once. Further, by deleting the display of the voice data file for which the voice recognition processing has been completed, it is possible to easily grasp how many unprocessed voice files are present.

【００３１】以下、本発明の第２実施形態を説明する。
ここでも第１実施形態と同様に、複数のファイルを連続
して音声認識する場合を想定する。Hereinafter, a second embodiment of the present invention will be described.
Here, as in the first embodiment, it is assumed that a plurality of files are continuously subjected to voice recognition.

【００３２】図１０は本発明の第２実施形態における、
操作者からの操作に対する制御部の動作を示すメインフ
ローチャートである。ステップＳ４１〜Ｓ４５は図３に
示すステップＳ１〜Ｓ５と対応しており、処理としては
同一なのでここでの説明を省略する。第２実施形態で
は、ファイルを選択して選択されたファイルを反転表示
させた後、ステップＳ４６でそのファイルが音声認識可
能かどうかを判断し、ＹＥＳの場合にはステップＳ４８
に進んで音声認識ボタンを選択可能にし、ＮＯの場合は
ステップＳ４７の選択不可の工程に進んで音声認識ボタ
ンを例えばグレイトーンで表示することで操作者が音声
認識ボタンを押しても選択が無効であることを操作者に
知らしめる。なお、ステップＳ４８においても第１実施
形態中のステップＳ３６〜Ｓ３８の処理をした方が良
い。ステップＳ３６〜Ｓ３８の処理については第１実施
形態における説明と同一であるので、ここでは説明を省
略する。ステップＳ４７またはステップＳ４８の後はス
テップＳ４３に進む。FIG. 10 shows a second embodiment of the present invention.
5 is a main flowchart showing an operation of a control unit in response to an operation from an operator. Steps S41 to S45 correspond to steps S1 to S5 shown in FIG. 3 and the processing is the same, so that the description here is omitted. In the second embodiment, after selecting a file and highlighting the selected file, it is determined in step S46 whether the file can be recognized by voice. If YES, step S48 is performed.
To make the voice recognition button selectable, and in the case of NO, the process proceeds to the non-selectable step of step S47 to display the voice recognition button in gray tone, for example, so that the selection is invalid even if the operator presses the voice recognition button. Let the operator know that there is something. Note that it is better to perform the processing of steps S36 to S38 in the first embodiment also in step S48. Since the processing in steps S36 to S38 is the same as that in the first embodiment, the description is omitted here. After step S47 or step S48, the process proceeds to step S43.

【００３３】図１１は第２実施形態における音声認識処
理の詳細を説明するためのフローチャートである。ステ
ップＳ５１のファイルリスト表示は図４のステップＳ１
４に対応するものであり、これ以前の処理は図４と同様
であるのでここでの説明は省略する。ステップＳ５１で
ファイルリストを表示した後、次のステップＳ５２で選
択されている複数のファイルの全ファイル容量（サイ
ズ）、すなわち全音声データ量を算出する。FIG. 11 is a flowchart for explaining the details of the voice recognition processing in the second embodiment. The file list display in step S51 is the same as that in step S1 in FIG.
4 and the processing before this is the same as that in FIG. 4, and the description thereof is omitted here. After the file list is displayed in step S51, the total file capacity (size) of the plurality of files selected in the next step S52, that is, the total audio data amount is calculated.

【００３４】次のステップＳ５２に進んで算出された全
ファイル容量が所定の容量よりも大きいかどうかを判断
し、ＹＥＳの場合にはステップＳ５４に進んで、例え
ば、“ファイルの総容量が大きいために処理に５分程度
かかります。このまま続けますか？”といったメッセー
ジを表示する。そして次のステップＳ５５で処理を続行
するかどうかについて使用者からの指示を待ち、続行す
る場合にはステップＳ５６に進み、続行しない場合には
リターンする。上記したステップＳ５３、Ｓ５４、Ｓ５
５では、選択されているファイルの総容量が所定の容量
よりも大きい場合には、音声認識処理に時間がかかるた
め、そのまま処理を続けるかどうかを操作者に判断させ
ることを意図している。なお、このとき、ＣＰＵ速度、
ＲＡＭ容量、ハードディスク（ＨＤ）の空き容量及び選
択ファイルの総容量から処理に要する概算時間を表示す
ることも可能である。The process proceeds to the next step S52 to judge whether or not the calculated total file size is larger than a predetermined size. If YES, the process proceeds to step S54, for example, "Since the total file size is large, It takes about 5 minutes to process. Do you want to continue as it is? " Then, in the next step S55, the process waits for an instruction from the user as to whether or not to continue the process. If the process is to be continued, the process proceeds to step S56; otherwise, the process returns. Steps S53, S54, S5 described above
In No. 5, if the total capacity of the selected file is larger than the predetermined capacity, the voice recognition processing takes a long time, and the intention is to allow the operator to determine whether or not to continue the processing. At this time, the CPU speed,
It is also possible to display the approximate time required for processing from the RAM capacity, the free space of the hard disk (HD) and the total capacity of the selected file.

【００３５】ステップＳ５６では音声認識すべき最初の
ファイルを“処理中”と表示する。次にステップＳ５７
に進んで音声認識処理を開始する。音声認識を開始した
後、ステップＳ５８に進んで全体の何％ぐらいまで処理
が終了したかを操作者に知らせるために、ステップＳ５
２で算出した全ファイル容量（全音声データ量）に対す
る、テキストデータへの変換を行なった処理済ファイル
容量、すなわち、処理済音声データ量の比（割合）を算
出する。次のステップＳ５９に進んでステップＳ５８で
求めた比率をもとにバーグラフの表示を逐次修正する。In step S56, the first file to be subjected to voice recognition is displayed as "processing". Next, step S57
To start speech recognition processing. After the voice recognition is started, the process proceeds to step S58 to notify the operator of the percentage of the entire process that has been completed in step S5.
A ratio (ratio) of the processed file size converted into text data to the total file size (total audio data amount) calculated in step 2, ie, the processed audio data amount is calculated. Proceeding to the next step S59, the display of the bar graph is sequentially corrected based on the ratio obtained in step S58.

【００３６】図１２は音声認識処理の状況を示す表示画
面である。図に示す確認（Ｏ．Ｋ．）ボタン２１０、ポ
ーズボタン２１１、デリートボタン２１２、キャンセル
ボタン２１３は第１実施形態の確認（Ｏ．Ｋ．）ボタン
１１０、ポーズボタン１１１、デリートボタン１１２、
キャンセルボタン１１３と同一の機能を有している。２
１４は百分率（％）に基づくバーグラフ表示であり、音
声認識処理が進行するにつれてカラーバー２１４Ａの先
端が右に移動するようになっている。FIG. 12 is a display screen showing the status of the voice recognition process. A confirmation (OK) button 210, a pause button 211, a delete button 212, and a cancel button 213 shown in the figure are a confirmation (OK) button 110, a pause button 111, a delete button 112 of the first embodiment,
It has the same function as the cancel button 113. 2
Reference numeral 14 denotes a bar graph display based on a percentage (%), and the tip of the color bar 214A moves to the right as the speech recognition process proceeds.

【００３７】音声認識処理中は１つのファイルの音声認
識処理が終了したかどうかを判断し（ステップＳ６
０）、１つのファイルの処理が終了するまで上記したス
テップを繰り返す。１つのファイルの処理が終了したと
きにステップＳ６０の判断がＹＥＳとなって、次のステ
ップＳ６１に進んで認識結果としてのテキストファイル
を保存する。ここでは、音声認識処理により得られたテ
キストファイルに、．ｖｐｓという拡張子をつけて音声
ファイルと同名で指定フォルダに保存する。During the voice recognition process, it is determined whether the voice recognition process for one file has been completed (step S6).
0) The above steps are repeated until the processing of one file is completed. When the processing of one file is completed, the determination in step S60 becomes YES, and the process proceeds to the next step S61 to save a text file as a recognition result. Here, the text file obtained by the speech recognition processing contains. The file is saved in the designated folder with the same name as the audio file with an extension of vps.

【００３８】次のステップＳ６２ですべてのファイルに
ついての音声認識が終了したかどうかを判断して、ＮＯ
の場合には、ステップＳ６４に進んで表示を切り換え
る。ここでは図１２に示すように、処理が終了したファ
イルを消去することなしに、当該ファイルの状態表示を
“済”として次のファイルの状態表示を“処理中”とす
る。次のステップＳ６５では次のファイルに対する音声
認識処理を開始して、ステップＳ５８に戻って上記の工
程を繰り返す。このようにしてすべてのファイルについ
ての音声認識処理が終了したときにステップＳ６２の判
断がＹＥＳとなり、次のステップＳ６３に進んで終了表
示を行なってリターンする。In the next step S62, it is determined whether or not the voice recognition for all the files has been completed.
In the case of, the process proceeds to step S64 to switch the display. Here, as shown in FIG. 12, without deleting the processed file, the status display of the file is set to "completed" and the status display of the next file is set to "processing". In the next step S65, the voice recognition processing for the next file is started, and the process returns to step S58 to repeat the above steps. When the voice recognition processing for all the files is completed in this way, the determination in step S62 is YES, the process proceeds to the next step S63, an end display is performed, and the process returns.

【００３９】なお、上記した実施形態ではＨＤ３に記録
されている音声データファイルを読み出して音声認識を
行なったが、音声記録再生装置からの音声データを受信
しながら同時に音声認識処理を行なうことも考えられ
る。このような場合は、送信される音声データフレーム
のヘッダに全ファイル容量に関する情報を書き込んでお
けば当該全ファイル容量がわかるので、全ファイル容量
に対する処理済ファイル容量の比（割合）を算出するこ
とが可能である。In the above-described embodiment, the audio data file recorded on the HD 3 is read and the voice recognition is performed. However, it is also conceivable that the voice recognition processing is performed simultaneously while receiving the voice data from the voice recording / reproducing apparatus. Can be In such a case, by writing information about the total file size in the header of the transmitted audio data frame, the total file size can be known. Therefore, the ratio (ratio) of the processed file size to the total file size should be calculated. Is possible.

【００４０】上記した第２実施形態によれば、バーグラ
フ表示により音声認識処理が進行していくようすを示す
ようにしたので、複数のファイルを一度に音声認識する
場合であっても、音声認識作業の進行状況を全体との比
較で把握することができる。According to the above-described second embodiment, the progress of the voice recognition process is indicated by the bar graph display. Work progress can be grasped by comparison with the whole.

【００４１】[0041]

【発明の効果】本発明によれば、複数のファイルを一度
に音声認識する場合であっても、音声認識作業の進行状
況を容易に把握することができるという効果を奏する。According to the present invention, it is possible to easily recognize the progress of the voice recognition work even when a plurality of files are voice-recognized at once.

[Brief description of the drawings]

【図１】本発明の実施形態に係る音声認識装置の構成を
示す図である。FIG. 1 is a diagram showing a configuration of a speech recognition device according to an embodiment of the present invention.

【図２】音声認識部に格納された音声認識ソフトウェア
を起動したときに表示装置に表示される表示画面を示す
図である。FIG. 2 is a diagram illustrating a display screen displayed on a display device when voice recognition software stored in a voice recognition unit is activated.

【図３】本発明の第１実施形態における、操作者からの
操作に対する制御部の動作を示すメインフローチャート
である。FIG. 3 is a main flowchart showing an operation of a control unit in response to an operation by an operator in the first embodiment of the present invention.

【図４】第１実施形態における音声認識処理の詳細の一
部を説明するためのフローチャートである。FIG. 4 is a flowchart illustrating a part of the details of a voice recognition process according to the first embodiment.

【図５】音声認識処理の状況を示す表示画面である。FIG. 5 is a display screen showing the status of a voice recognition process.

【図６】第１実施形態における音声認識処理の詳細の他
の一部を説明するためのフローチャートである。FIG. 6 is a flowchart for explaining another part of the details of the voice recognition process in the first embodiment.

【図７】音声認識処理されたテキストを表示したウイン
ドウを示す図である。FIG. 7 is a diagram showing a window displaying text subjected to voice recognition processing.

【図８】音声認識の終了したファイル名が消去された状
態を示す図である。FIG. 8 is a diagram showing a state where a file name for which speech recognition has been completed is deleted.

【図９】音声認識処理が終了したときの表示画面を示す
図である。FIG. 9 is a diagram showing a display screen when the voice recognition processing is completed.

【図１０】本発明の第２実施形態における、操作者から
の操作に対する制御部の動作を示すメインフローチャー
トである。FIG. 10 is a main flowchart showing an operation of a control unit in response to an operation by an operator in a second embodiment of the present invention.

【図１１】第２実施形態における音声認識処理の詳細を
説明するためのフローチャートである。FIG. 11 is a flowchart illustrating details of a speech recognition process according to the second embodiment.

【図１２】音声認識処理の状況を示す表示画面である。FIG. 12 is a display screen showing a status of a voice recognition process.

[Explanation of symbols]

１…音声記録再生装置、２…パーソナルコンピュータ（パソコン）、３…記録媒体（ＨＤ）、４…制御部、５…ＲＡＭ、６…スピーカ、７…増幅器、８…デジタルアナログ変換器（Ｄ／Ａ）、９…表示制御装置、１０…表示装置、１１…キーボード、１２…ケーブル。 DESCRIPTION OF SYMBOLS 1 ... Voice recording / reproducing apparatus, 2 ... Personal computer (PC), 3 ... Recording medium (HD), 4 ... Control part, 5 ... RAM, 6 ... Speaker, 7 ... Amplifier, 8 ... Digital / analog converter (D / A) ), 9: display control device, 10: display device, 11: keyboard, 12: cable.

Claims

[Claims]

1. A recording medium for recording an audio file having audio data, a selection unit capable of selecting a plurality of audio files, and a plurality of audio data of the plurality of audio files selected by the selection unit. A voice recognition unit for converting the text data into text data, and a display unit for displaying the progress of the text data conversion process by the voice recognition unit for each voice file.

2. The method according to claim 1, wherein the display unit determines a file name of the plurality of audio files selected by the selection unit, and determines whether text data conversion processing for the audio file corresponding to the file name has not been processed, is being processed, or has been processed. The speech recognition device according to claim 1, wherein a progress status indicating the following is displayed.

3. The display means displays file names of a plurality of audio files selected by the selection means,
2. The speech recognition apparatus according to claim 1, wherein the display of the file name of the speech file for which the text data conversion processing has been completed is deleted.

4. A method according to claim 1, further comprising: a total audio data amount of the plurality of audio files selected by said selection means, an audio data amount of an audio file corresponding to a text data converted by said voice recognition means, and said total audio data amount. 4. The apparatus according to claim 1, further comprising: a calculating unit configured to calculate a ratio of the audio data amount, wherein the display unit further displays a calculation result of the calculating unit. 5. Voice recognition device.

5. A recording medium recording a processing program for performing a voice recognition process by a computer, the processing program comprising: a plurality of voice data stored on a recording medium storing a voice file having voice data by the computer. A voice recognition processing program for reading a file, continuously performing voice recognition on the plurality of voice files to convert the voice files into text data, and displaying the progress of the text data conversion process for each voice file is recorded. Recording medium.

6. The voice recognition program further causes the computer to transmit a total voice data amount of a plurality of voice files to be converted to text data, a voice data volume of a voice file corresponding to the end of the text data conversion, and The recording medium according to claim 5, wherein a ratio of the audio data amount to the audio data amount is calculated, and the calculation result is further displayed.

7. A step of selecting a plurality of audio files from a recording medium on which an audio file having audio data is recorded; a step of continuously converting the selected plurality of audio files into text data; Displaying the progress of the processing for each audio file.