JP2000259181A

JP2000259181A - Device and method for recognizing speech information, and recording medium where program for recognizing speech information is recorded

Info

Publication number: JP2000259181A
Application number: JP11063630A
Authority: JP
Inventors: Hiroshi Koge; 浩高家
Original assignee: Olympus Optical Co Ltd
Current assignee: Olympus Corp
Priority date: 1999-03-10
Filing date: 1999-03-10
Publication date: 2000-09-22

Abstract

PROBLEM TO BE SOLVED: To provide a speech information recognition device, etc., capable of targeting a speech-recognizable part for speech recognition even in the case that speech data differing in compression ratio co-exist in one speech file. SOLUTION: This is a speech information recognition device configured of a personal computer 4 comprising a speech recognition program 9 converting speech information stored in a speech file in predetermined block units into character information by speech-recognizing each block, a control program 8 which checks a compression ratio of the speech information in each block and judges it to be suitable for speech recognition when the compression ratio is smaller than a predetermined one, judges it not to be suitable for speech recognition when the compression ratio is larger the predetermined one, and is made to execute speech recognition and its conversion into character information only when the compression ratio is judged to be suitable for speech recognition, and a display 5 for displaying unsuitableness and giving a warning when the compression ratio is judged to be unsuitable for speech recognition.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声情報認識装
置、音声情報認識方法、音声情報の認識をするためのプ
ログラムを記録した記録媒体、より詳しくは、音声情報
を認識して文字情報に変換する音声情報認識装置、音声
情報認識方法、音声情報の認識をするためのプログラム
を記録した記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice information recognition device, a voice information recognition method, a recording medium on which a program for recognizing voice information is recorded, and more specifically, voice information is recognized and converted into character information. The present invention relates to a voice information recognition device, a voice information recognition method, and a recording medium storing a program for recognizing voice information.

【０００２】[0002]

【従来の技術】近年、音声をディジタルデータに変換し
て音声ファイルとしてフラッシュメモリカード等に記録
するディジタルレコーダが製品化されている。2. Description of the Related Art In recent years, digital recorders have been commercialized that convert voice into digital data and record it as a voice file on a flash memory card or the like.

【０００３】このようなディジタルレコーダは、音声フ
ァイルを記録する際に、情報量を小さくするために符号
化を行うようにしているが、このときの圧縮率として、
比較的圧縮率の低いＳＰ（スタンダードプレイ）モード
と、比較的圧縮率の高いＬＰ（ロングプレイ）モードと
を設けたものがある。このように複数の録音モードを設
けることによって、音質を重視する際にはＳＰモードを
用い、録音時間を重視する際にはＬＰモードを用いるな
どの必要に応じた使い分けが可能となる利点を有してい
る。[0003] Such a digital recorder performs encoding in order to reduce the amount of information when recording an audio file.
Some models are provided with an SP (standard play) mode having a relatively low compression ratio and an LP (long play) mode having a relatively high compression ratio. Providing a plurality of recording modes in this way has the advantage that the SP mode can be used when sound quality is emphasized, and the LP mode is used when sound recording time is emphasized. are doing.

【０００４】音声ファイルは、例えば５１２バイトのサ
イズでなるブロックを単位として記録されるようになっ
ており、録音モードは該ブロック毎に変更することが可
能である。[0004] An audio file is recorded in units of blocks each having a size of, for example, 512 bytes, and the recording mode can be changed for each block.

【０００５】一方で、音声を認識して文字情報に変換す
る技術は、従来より研究されていたが、近年では、パー
ソナルコンピュータ等の情報機器におけるソフトウェア
として製品化されており、実用化の段階に達している。On the other hand, techniques for recognizing speech and converting it to character information have been studied in the past, but in recent years, it has been commercialized as software for information devices such as personal computers, and is in the stage of practical use. Has reached.

【０００６】このような音声認識の技術を、上述したよ
うなディジタルレコーダによって録音された音声ファイ
ル内の音声データに適用する場合には、認識率の低下を
避けるために、対象となる音声データの品質をある程度
以上のレベルに制限する必要があり、上述したような録
音モードの例においては、ＳＰモードにより録音された
もののみを音声認識の対象として、ＬＰモードにより録
音されたものは音声認識の対象から外すことが行われて
いる。When such a speech recognition technique is applied to speech data in a speech file recorded by a digital recorder as described above, in order to avoid a decrease in the recognition rate, the speech recognition of the target speech data is performed. It is necessary to limit the quality to a certain level or more, and in the above-described recording mode, only the sound recorded in the SP mode is subjected to speech recognition, and the sound recorded in the LP mode is subjected to speech recognition. Exclusions have been made.

【０００７】すなわち、従来の音声認識に係る技術にお
いては、ＳＰモードのみにより録音された音声ファイル
を音声認識の対象とし、その他の音声ファイルは音声認
識の対象外としており、音声認識を行うか否かは音声フ
ァイルを単位としたものであった。That is, in the conventional speech recognition technology, a speech file recorded only in the SP mode is targeted for speech recognition, and other speech files are excluded from speech recognition. The unit was an audio file.

【０００８】[0008]

【発明が解決しようとする課題】上述したようにＳＰモ
ードによる録音とＬＰモードによる録音は、音声ファイ
ルを構成するブロックを単位として行うことが可能であ
り、使用者が録音中に、記録媒体の残り容量が少なくな
ってきたと判断して、途中でＳＰモードからＬＰモード
に録音モードを切り換える、といった使い方をすること
が考えられる。As described above, the recording in the SP mode and the recording in the LP mode can be performed in units of blocks constituting an audio file. It is conceivable that the recording mode is switched from the SP mode to the LP mode on the way, judging that the remaining capacity has become small.

【０００９】このような場合には、１つの音声ファイル
内にＳＰモードにより記録されたブロックとＬＰモード
により記録されたブロックとが混在することになるが、
上述したような従来の技術では、このような音声ファイ
ルは音声認識の対象から外されてしまっていた。従っ
て、例え一部分であっても音声認識を行いたいと考える
使用者に対応することができなかった。In such a case, a block recorded in the SP mode and a block recorded in the LP mode are mixed in one audio file.
In the above-described conventional technology, such a voice file is excluded from voice recognition. Therefore, even if it is a part, it cannot respond to a user who wants to perform speech recognition.

【００１０】本発明は上記事情に鑑みてなされたもので
あり、一つの音声ファイル内に異なる圧縮率の音声デー
タが混在する場合でも、音声認識可能な部分については
音声認識の対象とすることができる音声情報認識装置、
音声情報認識方法、音声情報の認識をするためのプログ
ラムを記録した記録媒体を提供することを目的としてい
る。The present invention has been made in view of the above circumstances, and even when voice data having different compression ratios are mixed in one voice file, a voice recognizable portion may be subjected to voice recognition. Voice information recognition device,
An object of the present invention is to provide a voice information recognition method and a recording medium on which a program for recognizing voice information is recorded.

【００１１】[0011]

【課題を解決するための手段】上記の目的を達成するた
めに、第１の発明による音声情報認識装置は、音声情報
を認識して文字情報に変換する変換手段と、所定の圧縮
率で符号化された音声情報を該圧縮率に応じて文字情報
に変換するかしないかを制御する制御手段とを備えたも
のである。In order to achieve the above object, a speech information recognition apparatus according to a first aspect of the present invention includes a conversion means for recognizing speech information and converting it to character information, And control means for controlling whether or not the converted audio information is converted into character information in accordance with the compression ratio.

【００１２】また、第２の発明による音声情報認識装置
は、上記第１の発明による音声情報認識装置において、
上記制御手段が、音声情報の圧縮率が音声認識に適した
レベルにあるときは上記変換手段に音声認識を行わせて
音声情報を文字情報に変換させ、音声情報の圧縮率が音
声認識に適さないレベルにあるときは所定の警告信号を
出力するものである。Further, the speech information recognition device according to the second invention is the speech information recognition device according to the first invention,
When the compression rate of the speech information is at a level suitable for speech recognition, the control means causes the conversion means to perform speech recognition to convert the speech information into character information, and the compression rate of the speech information is suitable for speech recognition. When the level is not present, a predetermined warning signal is output.

【００１３】さらに、第３の発明による音声情報認識装
置は、上記第１または第２の発明による音声情報認識装
置において、音声ブロック毎に文字情報に変換するかし
ないかの制御を行うものである。Further, a voice information recognition device according to a third aspect of the present invention controls the voice information recognition device according to the first or second aspect of the present invention to determine whether or not each voice block is converted into character information. .

【００１４】第４の発明による音声情報認識方法は、所
定の圧縮率で符号化された音声情報を該圧縮率に応じて
文字情報に変換するかしないかを制御し、文字情報に変
換するという制御がされたときには音声情報を認識して
文字情報に変換させるものである。A speech information recognition method according to a fourth aspect of the present invention controls whether speech information encoded at a predetermined compression rate is converted into character information in accordance with the compression rate, and converts the information into character information. When the control is performed, the voice information is recognized and converted into character information.

【００１５】第５の発明による音声情報の認識をするた
めのプログラムを記録した記録媒体は、コンピュータに
よって音声情報の認識をするためのプログラムを記録し
た記録媒体であって、該プログラムは、コンピュータ
に、所定の圧縮率で符号化された音声情報を該圧縮率に
応じて文字情報に変換するかしないかを制御させ、文字
情報に変換するという制御がされたときには、音声情報
を認識して文字情報に変換させるものである。According to a fifth aspect of the present invention, a recording medium storing a program for recognizing voice information is a recording medium storing a program for recognizing voice information by a computer. If it is controlled to convert audio information encoded at a predetermined compression rate into character information according to the compression rate, and to convert the information to character information, the audio information is recognized and the character It is converted into information.

【００１６】[0016]

【発明の実施の形態】以下、図面を参照して本発明の実
施の形態を説明する。図１から図７は本発明の一実施形
態を示したものであり、図１はディクテーションシステ
ムの全体的な構成を示す図である。Embodiments of the present invention will be described below with reference to the drawings. 1 to 7 show an embodiment of the present invention, and FIG. 1 is a diagram showing an overall configuration of a dictation system.

【００１７】このディクテーションシステムは、図１に
示すように、音声を電気信号に変換して音声データ化
し、低圧縮率のＳＰ（スタンダードプレイ）モードまた
は高圧縮率のＬＰ（ロングプレイ）モードにより記録す
るディジタルレコーダ１と、このディジタルレコーダ１
に着脱可能に装着して用いられるものであって上記音声
データを記録する記録媒体たるミニチュアカード２と、
このミニチュアカード２を後述するＰＣカードスロット
１０（図２参照）に挿入して接続可能とするためのＰＣ
カードアダプタ３と、警告手段たるディスプレイ５や入
力を行うためのキーボード６，マウス７等を備え、上記
ＰＣカードスロット１０を介して上記ミニチュアカード
２から得た音声データに、制御プログラム８や音声認識
プログラム９による処理を施す音声情報認識装置たるパ
ーソナルコンピュータ４とを有して構成されている。As shown in FIG. 1, the dictation system converts sound into an electric signal and converts it into sound data, which is recorded in a low compression rate SP (standard play) mode or a high compression rate LP (long play) mode. Digital recorder 1 and the digital recorder 1
A miniature card 2 which is used by being detachably mounted on the card and is a recording medium for recording the audio data;
A PC for inserting the miniature card 2 into a PC card slot 10 (see FIG. 2) described later to enable connection.
A card adapter 3, a display 5 as a warning means, a keyboard 6 for inputting, a mouse 7 and the like are provided, and a control program 8 and a voice recognition program are added to voice data obtained from the miniature card 2 through the PC card slot 10. And a personal computer 4 as a voice information recognition device for performing the processing by the computer 9.

【００１８】次に、図２は上記パーソナルコンピュータ
４の電気的な構成を示すブロック図である。FIG. 2 is a block diagram showing an electrical configuration of the personal computer 4. As shown in FIG.

【００１９】このパーソナルコンピュータ４は、上記制
御プログラム８に従って音声再生や情報表示等を行い、
また上記音声認識プログラム９に従って音声情報を認識
して文字情報に変換する処理等を行うとともに、その他
の各種のプログラムに応じて様々な処理を行うものであ
り、変換手段、制御手段、音声認識手段を兼ねたＣＰＵ
１１と、このＣＰＵ１１の作業領域となる記録媒体たる
メインメモリ１２と、例えばハードディスク等でなり上
記制御プログラム８や音声認識プログラム９が記録され
ている記録媒体たる内部記録媒体１３と、各種の外部機
器に接続するための外部ポート１４と、上記ディスプレ
イ５を接続するインターフェース（以下、ＩＦと略す）
１５と、上記キーボード６やマウス７を接続するＩＦ１
６と、音声データに基づいて音声を発するスピーカ１８
と、このスピーカ１８を接続するＩＦ１７と、上記ＰＣ
カードアダプタ３に装着されたミニチュアカード２が挿
入されるＰＣカードスロット１０と、このＰＣカードス
ロット１０を接続するためのＩＦ１９と、を有して構成
されていて、上記ＣＰＵ１１、メインメモリ１２、内部
記録媒体１３、外部ポート１４、ＩＦ１５，１６，１
７，１９は、バスを介して互いに接続されている。The personal computer 4 reproduces sound and displays information according to the control program 8.
In addition to performing processing for recognizing voice information and converting it to character information in accordance with the voice recognition program 9, various processing is performed in accordance with other various programs. CPU that doubles as
11, a main memory 12 as a recording medium serving as a work area for the CPU 11, an internal recording medium 13 such as a hard disk or the like in which the control program 8 and the voice recognition program 9 are recorded, and various external devices. And an interface (hereinafter abbreviated as IF) for connecting the display 5 to the external port 14 for connecting to the display 5.
15 and an IF1 for connecting the keyboard 6 and the mouse 7
6 and a speaker 18 for emitting sound based on the sound data
And an IF 17 for connecting the speaker 18 and the PC
It comprises a PC card slot 10 into which the miniature card 2 attached to the card adapter 3 is inserted, and an IF 19 for connecting the PC card slot 10 to the CPU 11, the main memory 12, and the internal recording. Medium 13, external port 14, IFs 15, 16, 1
7, 19 are connected to each other via a bus.

【００２０】なお、音声データは、上記ＰＣカードスロ
ット１０を介してミニチュアカード２から直接読み込む
ようにしても良いが、一旦、上記内部記録媒体１３に記
録して、この内部記録媒体１３から読み出すようにして
も良いし、あるいは、ディジタルレコーダ１から赤外線
やシリアル、ＵＳＢ等の通信手段を介して直接読み込む
ようにしても構わない。The audio data may be directly read from the miniature card 2 via the PC card slot 10. However, the audio data is temporarily recorded on the internal recording medium 13 and read from the internal recording medium 13. Alternatively, the information may be directly read from the digital recorder 1 via communication means such as infrared, serial, or USB.

【００２１】また、上記制御プログラム８や音声認識プ
ログラム９は、出荷時に予め内部記録媒体１３に記録し
ておいても良いが、汎用のパーソナルコンピュータの場
合には、これらのプログラムが記録されたフロッピーデ
ィスクやＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭなどの記録媒体
から、内部記録媒体１３にインストールするようにして
も良い。このときには、これらフロッピーディスク、Ｃ
Ｄ−ＲＯＭ、ＤＶＤ−ＲＯＭ等が、音声情報の認識をす
るためのプログラムを記録した記録媒体を構成すること
になる。The control program 8 and the voice recognition program 9 may be recorded in the internal recording medium 13 before shipment, but in the case of a general-purpose personal computer, the floppy disk on which these programs are recorded is stored. The recording medium such as a disk, CD-ROM, or DVD-ROM may be installed in the internal recording medium 13. At this time, these floppy disks, C
A D-ROM, a DVD-ROM, or the like constitutes a recording medium on which a program for recognizing audio information is recorded.

【００２２】図３は、上述したようなディクテーション
システムにおける音声ファイルの構造を示すブロック図
である。FIG. 3 is a block diagram showing the structure of an audio file in the dictation system as described above.

【００２３】この音声ファイルは、例えばＤＳＳという
タイプのファイル構造を有するものとして構成されてい
る。This audio file has a file structure of, for example, DSS type.

【００２４】すなわち、音声ファイルは、図３（Ａ）に
示すように、例えば５１２バイトのサイズでなるブロッ
クを単位として構成されていて、その内の先頭の例えば
２ブロックがファイルヘッダとなっており、残りの部分
が音声データが記録されているブロックである。That is, as shown in FIG. 3A, the audio file is configured in units of blocks each having a size of, for example, 512 bytes, and the first two blocks, for example, are used as a file header. , And the rest are blocks in which audio data is recorded.

【００２５】上記ファイルヘッダには、例えばブロック
ヘッダ数、自己認識フラグ、システムバージョン番号、
システムリリース番号、ライセンスＩＤコード、ユーザ
ＩＤコード、ジョブ番号コード、使用目的種別コード、
工程管理コード、状態管理コード、録音開始日時、録音
終了日時、録音時間、誤消去防止フラグ、優先度レベ
ル、送り先ＩＤコード、複数のＩマークアドレス、予備
領域、管理コード領域、ファイルテキスト基本領域、フ
ァイルテキスト予備領域等が設けられている。The file header includes, for example, the number of block headers, a self-recognition flag, a system version number,
System release number, license ID code, user ID code, job number code, purpose code,
Process management code, status management code, recording start date and time, recording end date and time, recording time, erroneous erasure prevention flag, priority level, destination ID code, multiple I mark addresses, spare area, management code area, file text basic area, A file text spare area and the like are provided.

【００２６】上記音声データは、音声情報ブロックであ
るＤＳＳブロックを１つ以上有して構成されている。こ
れら各ＤＳＳブロックには、図３（Ｂ）に示すように、
先頭にＤＳＳブロックヘッダが設けられており、その他
の部分にフレーム単位で音声データが記録されている。The audio data has one or more DSS blocks, which are audio information blocks. Each of these DSS blocks has, as shown in FIG.
A DSS block header is provided at the beginning, and audio data is recorded in other parts in frame units.

【００２７】上記ＤＳＳブロックヘッダには、該ＤＳＳ
ブロックに記録される音声データが、上記ＳＰ（スタン
ダードプレイ）モードにより低圧縮率で記録されたもの
か、あるいはＬＰ（ロングプレイ）モードにより高圧縮
率で記録されたものかを判別するための情報が記録され
ている。The DSS block header includes the DSS
Information for determining whether the audio data recorded in the block is recorded at a low compression rate in the SP (standard play) mode or is recorded at a high compression rate in the LP (long play) mode. Is recorded.

【００２８】すなわち、この音声ファイルは、ブロック
を単位として、ＳＰモードによる録音と、ＬＰモードに
よる録音とを切り換えることが可能となっている。これ
により、音声ファイルの種類としては、ＳＰモードのみ
のＤＳＳブロックで構成されるものと、ＬＰモードのみ
のＤＳＳブロックで構成されるものと、ＳＰモードによ
るＤＳＳブロックとＬＰモードによるＤＳＳブロックと
が混在して構成されるものとの３種類が存在する。That is, this audio file can be switched between recording in the SP mode and recording in the LP mode in units of blocks. As a result, the type of the audio file includes a DSS block composed of only the SP mode, a DSS block composed of only the LP mode, and a DSS block composed of the SP mode and a DSS block composed of the LP mode. There are three types.

【００２９】図４は、上述したようなパーソナルコンピ
ュータにおいて音声認識を行うときの動作を示すフロー
チャートである。FIG. 4 is a flowchart showing the operation of the personal computer as described above when performing voice recognition.

【００３０】動作が開始されると、ＰＣカードスロット
１０を介してミニチュアカード２に記録されている音声
ファイルの取り込みを行う（ステップＳ１）。そして、
ファイルシステムの管理領域や上記ファイルヘッダに記
録されている各ファイルのデータを読み込んで（ステッ
プＳ２）、読み込んだファイルデータに含まれているフ
ァイル名情報を、上記ディスプレイ５に一覧表示させる
（ステップＳ３）。When the operation is started, an audio file recorded on the miniature card 2 is captured via the PC card slot 10 (step S1). And
The data of each file recorded in the management area of the file system and the file header is read (step S2), and the file name information included in the read file data is displayed on the display 5 as a list (step S3). ).

【００３１】このときにディスプレイ５の画面には例え
ば図６に示すような表示が行われる。図６は、パーソナ
ルコンピュータにおいて音声認識を行うときの画面表示
の一例を示す図である。At this time, for example, a display as shown in FIG. 6 is performed on the screen of the display 5. FIG. 6 is a diagram illustrating an example of a screen display when performing voice recognition in a personal computer.

【００３２】この図６はメイン画面２１を示しており、
ファイルに関する操作や編集に関する操作などを選択す
るためのメニューバー２２と、各種の操作をアイコンを
用いて視覚的に分かり易く表示するツールボタンバー２
３と、上記ミニチュアカード２から転送された音声ファ
イルに係る各種のデータを表示する音声ファイルリスト
ボックス２４と、再生や停止、早送りや早戻しなどの処
理を行うためのコントロールボタン２５と、がそれぞれ
表示されていて、上記ツールボタンバー２３には音声認
識処理を開始させるための音声認識ボタン２６が設けら
れている。FIG. 6 shows the main screen 21.
A menu bar 22 for selecting an operation related to a file or an operation related to editing, and a tool button bar 2 for displaying various operations in an easily understandable manner using icons.
3, an audio file list box 24 for displaying various data relating to the audio file transferred from the miniature card 2, and a control button 25 for performing processing such as play, stop, fast forward and fast reverse, respectively. The displayed tool button bar 23 is provided with a voice recognition button 26 for starting voice recognition processing.

【００３３】上記音声ファイルリストボックス２４に
は、ファイル名を一覧表示させる欄３１と、録音時間を
表示させる欄３２と、録音日時を表示させる欄３３と、
音声認識を行うことができるか否かを確認するために録
音モードを表示させる欄３４と、エンドマーク欄３５と
が順に設けられており、上記ステップＳ２において読み
取ったファイルデータに基づいて、上記ステップＳ３で
表示するようになっている。In the audio file list box 24, a column 31 for displaying a list of file names, a column 32 for displaying a recording time, a column 33 for displaying a recording date and time,
In order to confirm whether or not voice recognition can be performed, a field 34 for displaying a recording mode and an end mark field 35 are provided in order, and based on the file data read in step S2, The information is displayed in S3.

【００３４】続いて、使用者により音声認識処理を行う
対象となるファイルが選択され、かつ、その選択された
ファイルについて音声認識を行う指示がなされるのを待
機する（ステップＳ４）。Then, the process waits for the user to select a file to be subjected to the voice recognition process and to give an instruction to perform voice recognition on the selected file (step S4).

【００３５】ここで、音声ファイルの選択は、上記図６
に示すように、上記音声ファイルリストボックス２４内
の選択を行おうとする対象のファイル部分を、上記マウ
ス７等によりクリックすることで、反転表示がなされ、
選択されていることを視覚的に確認することができる。Here, the selection of the audio file is performed according to FIG.
As shown in the figure, by clicking on the file portion to be selected in the audio file list box 24 with the mouse 7 or the like, the highlighted portion is displayed.
The selection can be visually confirmed.

【００３６】そして、ファイルが選択されている状態に
おいて、上記音声認識ボタン２６をマウス７等によりク
リックすることで、音声認識を行う指示入力がなされる
ようになっている。When the voice recognition button 26 is clicked with the mouse 7 or the like while a file is selected, an instruction input for voice recognition is made.

【００３７】こうして、上記ステップＳ４において、音
声ファイルが選択されて音声認識の指示が行われたこと
が確認された場合には、次に、その音声ファイルがＳＰ
モードによる録音部分とＬＰモードによる録音部分とを
混合して有するファイルであるか否かを判断する（ステ
ップＳ５）。If it is confirmed in step S4 that a voice file has been selected and a voice recognition instruction has been given, then the voice file is added to the SP file.
It is determined whether or not the file has a mixture of the recording part in the mode and the recording part in the LP mode (step S5).

【００３８】ここで、混合ファイルでない場合には、Ｓ
Ｐモードのみにより録音された音声ファイルであるかを
判断し（ステップＳ６）、ＳＰモードの音声ファイルで
ある場合には、該音声ファイル内の全音声データが音声
認識の対象となって、音声認識処理を行う（ステップＳ
８）。この音声認識が終了した場合には、上記ステップ
Ｓ４へ戻って、次のファイルが選択されるのを待機す
る。Here, if the file is not a mixed file, S
It is determined whether the file is a voice file recorded only in the P mode (step S6). If the file is a voice file in the SP mode, all voice data in the voice file is subjected to voice recognition, and voice recognition is performed. Perform processing (step S
8). When the voice recognition has been completed, the process returns to step S4 to wait for the next file to be selected.

【００３９】また、上記ステップＳ６において、音声フ
ァイルがＳＰモードのみにより録音された音声ファイル
でないと判断された場合には、ＬＰモードのみにより録
音された音声ファイルであることになるために、該音声
ファイル内には音声認識の対象となる音声データが存在
しないことになる。従って、音声認識を実行することが
不可能である旨を上記ディスプレイ５に表示して（ステ
ップＳ７）、その後、上記ステップＳ４に戻って、他の
ファイルが選択されるのを待機する。If it is determined in step S6 that the audio file is not an audio file recorded only in the SP mode, it is determined that the audio file is an audio file recorded only in the LP mode. There is no voice data to be subjected to voice recognition in the file. Accordingly, the fact that it is impossible to execute the voice recognition is displayed on the display 5 (step S7), and thereafter, the process returns to the step S4 to wait for another file to be selected.

【００４０】一方、上記ステップＳ５において、ＳＰモ
ードとＬＰモードによる録音部分が混合されたファイル
であると判断された場合には、一部に音声認識を実行す
ることができないところがある旨の警告表示を上記ディ
スプレイ５により行って（ステップＳ９）、念のため
に、処理をキャンセルするか否かを使用者に確認させる
（ステップＳ１０）。On the other hand, if it is determined in step S5 that the file is a file in which the recorded portions in the SP mode and the LP mode are mixed, a warning message indicating that there is a part where voice recognition cannot be performed is displayed. Is performed on the display 5 (step S9), and just in case, the user is asked whether to cancel the process (step S10).

【００４１】ここで、処理をキャンセルすることが選択
された場合には、上記ステップＳ４に戻り、一方、処理
を行うことが選択された場合には、上記ステップＳ８へ
行って、音声ファイル中のＳＰモードで録音された部分
についてのみ音声認識処理を行う。Here, if the cancellation of the process is selected, the process returns to the step S4. If the process is selected, the process proceeds to the step S8, and the process proceeds to the step S8. The voice recognition processing is performed only on the part recorded in the SP mode.

【００４２】次に、図５は上記ステップＳ８の音声認識
処理の詳細を示すフローチャートである。Next, FIG. 5 is a flowchart showing details of the voice recognition processing in step S8.

【００４３】この動作が開始されると、音声認識を行う
対象のＤＳＳブロックのＤＳＳブロックヘッダ情報を読
み込んで（ステップＳ２１）、該ＤＳＳブロックの録音
モード（圧縮モード）を判断する（ステップＳ２２）。When this operation is started, the DSS block header information of the DSS block to be subjected to voice recognition is read (step S21), and the recording mode (compression mode) of the DSS block is determined (step S22).

【００４４】ここで、圧縮モードがＳＰモードであると
判断された場合には、そのＤＳＳブロックについて音声
認識を実行し（ステップＳ２３）、その音声認識の結果
を出力して例えば図７に示すように文字として上記ディ
スプレイ５に表示する（ステップＳ２４）。Here, when it is determined that the compression mode is the SP mode, speech recognition is executed for the DSS block (step S23), and the result of the speech recognition is output, for example, as shown in FIG. Are displayed on the display 5 as characters (step S24).

【００４５】図７は、文章の表示を行うことができるソ
フトウェアの表示画面の一例を示した図であり、音声認
識の結果が表示されている状態を示している。この図７
において、画面４１中の符号４２や符号４４に示した部
分が、その認識結果を表示する部分である。FIG. 7 is a diagram showing an example of a display screen of software capable of displaying a sentence, and shows a state in which a result of voice recognition is displayed. This FIG.
In FIG. 7, the portions indicated by reference numerals 42 and 44 in the screen 41 are portions for displaying the recognition results.

【００４６】そして、該ＤＳＳブロックについての音声
認識が終了したら、音声ファイルの終端に達したか否か
を判断し（ステップＳ２６）、まだ終端でない場合に
は、上記ステップＳ２１へ戻って、次のＤＳＳブロック
についての処理を行う。When the voice recognition for the DSS block is completed, it is determined whether or not the end of the voice file has been reached (step S26). If not, the process returns to step S21 to return to the next step S21. Perform processing for the DSS block.

【００４７】また、上記ステップＳ２２において、圧縮
モードがＬＰモードであると判断された場合には、その
ＤＳＳブロックについては音声認識を実行しないことに
なるために、その旨の警告信号を出力して、上記ディス
プレイ５に、図７中の符号４３に示すように表示させる
（ステップＳ２４）。If it is determined in step S22 that the compression mode is the LP mode, a speech signal is output to the effect that the speech recognition is not executed for the DSS block. Is displayed on the display 5 as indicated by reference numeral 43 in FIG. 7 (step S24).

【００４８】この図７における例では、音声認識が不可
能となるブロックが１８秒の録音時間分だけ存在する旨
を文字として表示しているが、これに限るものではな
く、例えば「？」や「＊」などの記号を挿入するように
しても良く、さらには、挿入する記号の数を認識が不可
能となる時間に比例させるようにしても良い。In the example shown in FIG. 7, the fact that blocks for which speech recognition cannot be performed for the recording time of 18 seconds is displayed as characters, but the present invention is not limited to this. For example, "?" Symbols such as “*” may be inserted, and the number of symbols to be inserted may be made to be proportional to the time when recognition becomes impossible.

【００４９】こうして、上記ステップＳ２６において、
音声ファイルの終端に達したと判断されたところで、音
声認識の処理を終了する。Thus, in step S26,
When it is determined that the end of the voice file has been reached, the voice recognition processing ends.

【００５０】なお、上述では音声情報認識装置として、
制御プログラムや音声認識プログラムを実行するパーソ
ナルコンピュータを例に挙げたが、勿論これに限るもの
ではなく、その他の汎用のコンピュータであっても良い
し、あるいは専用の音声情報認識装置として構成しても
構わない。In the above description, the speech information recognition device is
Although the personal computer that executes the control program and the voice recognition program has been described as an example, the present invention is not limited to this, and may be another general-purpose computer or may be configured as a dedicated voice information recognition device. I do not care.

【００５１】このような実施形態によれば、圧縮率が音
声認識に適したものであるか否かをブロック単位（音声
ブロック単位）で確認しているために、一つの音声ファ
イル内に音声認識に適しているブロックと適していない
ブロックとが混在する場合でも、音声認識可能な部分に
ついては音声認識を行って文字情報に変換することがで
きる。According to such an embodiment, since it is confirmed in block units (in audio block units) whether or not the compression ratio is suitable for speech recognition, audio recognition is performed in one audio file. Even if there are blocks that are suitable for and some blocks that are not suitable, it is possible to perform voice recognition on the part that can be recognized and convert it into character information.

【００５２】また、音声認識が行われないときにはディ
スプレイにその旨を表示するために、容易に認識するこ
とが可能となる。そして、音声認識が行われない時間長
さに比例した表示を行えば、その量をより感覚的に把握
し易くなる。Further, when the voice recognition is not performed, the fact is displayed on the display, so that the recognition can be easily performed. If the display is performed in proportion to the length of time during which the speech recognition is not performed, the amount can be more intuitively grasped.

【００５３】なお、本発明は上述した実施形態に限定さ
れるものではなく、発明の主旨を逸脱しない範囲内にお
いて種々の変形や応用が可能であることは勿論である。It should be noted that the present invention is not limited to the above-described embodiment, and it is needless to say that various modifications and applications can be made without departing from the gist of the invention.

【００５４】［付記］以上詳述したような本発明の上記
実施形態によれば、以下のごとき構成を得ることができ
る。[Appendix] According to the embodiment of the present invention described in detail above, the following configuration can be obtained.

【００５５】（１）音声ファイル内に所定のブロック
単位で格納されている音声情報を、該ブロック毎に音声
認識して文字情報に変換する音声認識手段と、音声情報
が音声認識に適しているか否かをブロック毎に確認し
て、音声認識に適していると判断された場合にのみ、上
記音声認識手段による動作を実行させる制御手段と、を
具備したことを特徴とする音声情報認識装置。(1) Speech recognition means for recognizing speech information stored in a speech file in units of predetermined blocks and converting it into character information for each block, and whether the speech information is suitable for speech recognition And a control unit for executing the operation by the voice recognition unit only when it is determined that the voice recognition unit is suitable for voice recognition.

【００５６】（２）上記制御手段は、ブロック単位の
音声情報の圧縮率が、所定の圧縮率以下である場合に音
声認識に適していると判断し、所定の圧縮率よりも大き
い場合には音声認識に適していないと判断するものであ
ることを特徴とする付記（１）に記載の音声情報認識装
置。(2) The control means determines that the speech information is suitable for speech recognition when the compression rate of the audio information in block units is equal to or less than a predetermined compression rate, and when the compression rate is larger than the predetermined compression rate. The speech information recognition device according to supplementary note (1), which determines that the speech information is not suitable for speech recognition.

【００５７】（３）上記制御手段により音声認識に適
していないと判断された場合には、その旨を警告する警
告手段をさらに具備したことを特徴とする付記（１）に
記載の音声情報認識装置。(3) If the control means determines that the speech information is not suitable for speech recognition, a warning means for warning the fact is further provided. apparatus.

【００５８】（４）音声ファイル内に所定のブロック
単位で格納されている音声情報が、音声認識に適してい
るか否かをブロック毎に確認するステップと、音声認識
に適していると判断された場合にのみ、上記音声情報を
上記ブロック毎に音声認識して文字情報に変換するステ
ップと、を具備したことを特徴とする音声認識方法。(4) A step of checking for each block whether or not the voice information stored in the voice file in a predetermined block unit is suitable for voice recognition, and it is determined that the voice information is suitable for voice recognition. And performing speech recognition of the speech information on a block-by-block basis and converting the speech information into character information only in the above case.

【００５９】（５）コンピュータによって音声情報の
認識をするためのプログラムを記録した記録媒体であっ
て、該プログラムは、コンピュータに、音声ファイル内
に所定のブロック単位で格納されている音声情報が、音
声認識に適しているか否かをブロック毎に確認させ、音
声認識に適していると判断された場合にのみ、上記音声
情報を上記ブロック毎に音声認識して文字情報に変換さ
せるものであることを特徴とする音声情報の認識をする
ためのプログラムを記録した記録媒体。(5) A recording medium in which a program for recognizing audio information by a computer is recorded. The program stores, in the computer, audio information stored in a sound file in a predetermined block unit. It is necessary to confirm whether or not it is suitable for speech recognition for each block, and to convert the speech information to text information by speech recognition for each block only when it is determined that the speech information is suitable for speech recognition. A recording medium on which a program for recognizing audio information is recorded.

【００６０】（６）音声情報を記録する記録媒体と、
この記録媒体に記録された音声情報を文字情報に変換す
る変換手段と、この変換手段による音声情報の処理を、
該音声情報の圧縮率に応じて制御する制御手段と、を具
備したことを特徴とする音声情報認識装置。(6) a recording medium for recording audio information;
Converting means for converting the audio information recorded on the recording medium into character information, and processing the audio information by the converting means;
Control means for controlling the voice information in accordance with the compression ratio of the voice information.

【００６１】（７）警告手段をさらに具備し、上記制
御手段は、上記圧縮率が音声認識可能な範囲である場合
には上記変換手段に音声認識を行わせ文字情報を出力さ
せ、音声認識可能な範囲外である場合には上記警告手段
にその旨を警告させるものであることを特徴とする付記
（６）に記載の音声情報認識装置。(7) A warning means is further provided, wherein the control means causes the conversion means to perform speech recognition when the compression ratio is within a range in which speech recognition is possible, and outputs character information to enable speech recognition. The voice information recognition apparatus according to (6), wherein when the value is out of the range, the warning means is warned to that effect.

【００６２】従って、付記（１）に記載の発明によれ
ば、音声情報が音声認識に適しているか否かをブロック
単位で確認しているために、一つの音声ファイル内に音
声認識に適している音声情報ブロックと適していない音
声情報ブロックとが混在する場合でも、音声認識可能な
部分については音声認識を行って文字情報に変換するこ
とができる。Therefore, according to the invention described in the appendix (1), whether or not the voice information is suitable for voice recognition is confirmed in units of blocks, so that one voice file is suitable for voice recognition. Even in the case where there are voice information blocks that are present and voice information blocks that are not suitable, it is possible to perform voice recognition on the part that can be recognized and convert it into character information.

【００６３】また、付記（２）に記載の発明によれば、
付記（１）に記載の発明と同様の効果を奏するととも
に、音声認識に適しているか否かを圧縮率により判断す
ることができる。According to the invention described in Appendix (2),
The present invention has the same effect as the invention described in Appendix (1), and can determine whether or not it is suitable for speech recognition based on the compression ratio.

【００６４】さらに、付記（３）に記載の発明によれ
ば、付記（１）に記載の発明と同様の効果を奏するとと
もに、音声認識が行われないときには警告手段により警
告が行われるために、その旨を認識することが可能とな
る。Further, according to the invention described in the supplementary note (3), the same effect as that of the invention described in the supplementary note (1) is obtained, and a warning is issued by the warning means when the voice recognition is not performed. It is possible to recognize that.

【００６５】付記（４）に記載の発明によれば、音声情
報が音声認識に適しているか否かをブロック単位で確認
しているために、一つの音声ファイル内に音声認識に適
している音声情報ブロックと適していない音声情報ブロ
ックとが混在する場合でも、音声認識可能な部分につい
ては音声認識を行って文字情報に変換することができ
る。According to the invention described in Appendix (4), whether or not the speech information is suitable for speech recognition is confirmed in block units, so that a speech file suitable for speech recognition is contained in one speech file. Even when the information block and the unsuitable voice information block are mixed, the voice recognizable portion can be converted into character information by performing voice recognition.

【００６６】付記（５）に記載の発明によれば、該プロ
グラムによりコンピュータを制御することによって、音
声情報が音声認識に適しているか否かがブロック単位で
確認されるために、一つの音声ファイル内に音声認識に
適している音声情報ブロックと適していない音声情報ブ
ロックとが混在する場合でも、音声認識可能な部分につ
いては音声認識を行って文字情報に変換することができ
る。According to the invention described in the supplementary note (5), by controlling the computer by the program, it is confirmed whether or not the speech information is suitable for speech recognition in units of blocks. Even if the voice information block suitable for voice recognition and the voice information block not suitable for voice recognition coexist, the voice recognizable portion can be converted to character information by performing voice recognition.

【００６７】付記（６）に記載の発明によれば、音声情
報から文字情報への変換を、圧縮率に応じて制御してい
るために、一つの音声ファイル内に異なる圧縮率の音声
データが混在する場合でも、音声認識可能な部分につい
ては音声認識の対象とすることが可能となる。According to the invention described in the appendix (6), since the conversion from the audio information to the character information is controlled in accordance with the compression ratio, audio data having different compression ratios can be stored in one audio file. Even in a mixed case, a portion that can be recognized by speech can be subjected to speech recognition.

【００６８】付記（７）に記載の発明によれば、付記
（６）に記載の発明と同様の効果を奏するとともに、音
声認識が行われないときには警告手段により警告が行わ
れるために、その旨を認識することが可能となる。According to the invention described in the supplementary note (7), the same effect as the invention described in the supplementary note (6) can be obtained, and a warning is issued by the warning means when the voice recognition is not performed. Can be recognized.

【００６９】[0069]

【発明の効果】以上説明したように請求項１による本発
明の音声情報認識装置によれば、音声情報から文字情報
への変換を、圧縮率に応じて制御しているために、一つ
の音声ファイル内に異なる圧縮率の音声データが混在す
る場合でも、音声認識可能な部分については音声認識の
対象とすることが可能となる。As described above, according to the speech information recognition apparatus of the present invention according to the first aspect, since the conversion from speech information to character information is controlled according to the compression ratio, one speech is recognized. Even when voice data with different compression ratios are mixed in a file, it is possible to perform a voice recognition on a part that can be recognized.

【００７０】また、請求項２による本発明の音声情報認
識装置によれば、請求項１に記載の発明と同様の効果を
奏するとともに、音声認識が行われないときには所定の
警告信号を出力するようにしたために、その旨を認識す
ることが可能となる。According to the second aspect of the present invention, the same effects as those of the first aspect can be obtained, and a predetermined warning signal is output when no voice recognition is performed. This makes it possible to recognize that fact.

【００７１】さらに、請求項３による本発明の音声情報
認識装置によれば、請求項１または請求項２に記載の発
明と同様の効果を奏することができる。Further, according to the voice information recognition apparatus of the present invention, the same effects as those of the first or second aspect can be obtained.

【００７２】請求項４による本発明の音声情報認識方法
によれば、音声情報から文字情報への変換を、圧縮率に
応じて制御しているために、一つの音声ファイル内に異
なる圧縮率の音声データが混在する場合でも、音声認識
可能な部分については音声認識の対象とすることが可能
となる。According to the voice information recognition method of the present invention, since the conversion from voice information to character information is controlled according to the compression ratio, different compression ratios in one voice file can be obtained. Even in the case where voice data is mixed, a portion that can be recognized by voice can be subjected to voice recognition.

【００７３】請求項５による本発明の音声情報の認識を
するためのプログラムを記録した記録媒体によれば、該
プログラムによりコンピュータを制御することによっ
て、音声情報から文字情報への変換が、圧縮率に応じて
制御されるために、一つの音声ファイル内に異なる圧縮
率の音声データが混在する場合でも、音声認識可能な部
分については音声認識の対象とすることが可能となる。According to the recording medium storing the program for recognizing speech information of the present invention according to claim 5, by controlling the computer with the program, the conversion from the speech information to the character information can be performed at the compression ratio. Therefore, even when voice data having different compression ratios are mixed in one voice file, a voice recognizable portion can be subjected to voice recognition.

[Brief description of the drawings]

【図１】本発明の一実施形態におけるディクテーション
システムの全体的な構成を示す図。FIG. 1 is a diagram showing an overall configuration of a dictation system according to an embodiment of the present invention.

【図２】上記実施形態のディクテーションシステムにお
けるパーソナルコンピュータの電気的な構成を示すブロ
ック図。FIG. 2 is an exemplary block diagram showing an electric configuration of a personal computer in the dictation system of the embodiment.

【図３】上記実施形態のディクテーションシステムにお
ける音声ファイルの構造を示すブロック図。FIG. 3 is a block diagram showing a structure of an audio file in the dictation system of the embodiment.

【図４】上記実施形態のパーソナルコンピュータにおい
て、音声認識を行うときの動作を示すフローチャート。FIG. 4 is a flowchart showing an operation when performing voice recognition in the personal computer of the embodiment.

【図５】上記図４における音声認識処理の詳細を示すフ
ローチャート。FIG. 5 is a flowchart showing details of a voice recognition process in FIG. 4;

【図６】上記実施形態のパーソナルコンピュータにおい
て、音声認識を行うときの画面表示の一例を示す図。FIG. 6 is an exemplary view showing an example of a screen display when performing voice recognition in the personal computer of the embodiment.

【図７】上記実施形態のパーソナルコンピュータにおい
て、音声認識の結果を示す表示画面の一例を示す図。FIG. 7 is an exemplary view showing an example of a display screen showing a result of voice recognition in the personal computer of the embodiment.

[Explanation of symbols]

１…ディジタルレコーダ２…ミニチュアカード４…パーソナルコンピュータ（音声情報認識装置）５…ディスプレイ（警告手段）６…キーボード７…マウス８…制御プログラム９…音声認識プログラム１１…ＣＰＵ（変換手段、制御手段、音声認識手段）１２…メインメモリ１３…内部記録媒体（音声情報の認識をするためのプロ
グラムを記録した記録媒体）DESCRIPTION OF SYMBOLS 1 ... Digital recorder 2 ... Miniature card 4 ... Personal computer (voice information recognition device) 5 ... Display (warning means) 6 ... Keyboard 7 ... Mouse 8 ... Control program 9 ... Voice recognition program 11 ... CPU (conversion means, control means, Voice recognition means) 12 main memory 13 internal recording medium (recording medium storing a program for recognizing voice information)

Claims

[Claims]

1. A conversion means for recognizing voice information and converting it into character information, and a control for controlling whether or not voice information encoded at a predetermined compression rate is converted into text information according to the compression rate. Means, and a voice information recognition device, characterized by comprising:

And a control unit that, when the compression ratio of the speech information is at a level suitable for speech recognition, causes the conversion unit to perform speech recognition to convert the speech information into character information. 2. The speech information recognition apparatus according to claim 1, wherein a predetermined warning signal is output when is at a level not suitable for speech recognition.

3. The voice information recognition device according to claim 1, wherein control is performed to determine whether or not to convert to character information for each voice block.

4. Controlling whether or not voice information encoded at a predetermined compression rate is converted into character information according to the compression rate, and recognizes voice information when control is performed to convert the information into character information. A voice information recognizing method characterized by converting the voice information into character information.

5. A recording medium on which a program for recognizing audio information by a computer is recorded, said program causing the computer to convert audio information encoded at a predetermined compression rate in accordance with the compression rate. A program for recognizing voice information characterized by recognizing voice information and converting it to text information is controlled when the control to convert or not to text information is performed. The recording medium on which it was recorded.