JPH02126299A

JPH02126299A - Speech recognition device

Info

Publication number: JPH02126299A
Application number: JP63279510A
Authority: JP
Inventors: Katsumi Tokuyama; 勝己徳山; Toshimichi Arima; 俊道有馬; Shinichi Sato; 慎一佐藤
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1988-11-07
Filing date: 1988-11-07
Publication date: 1990-05-15

Abstract

PURPOSE:To improve a recognition by registering a reproduced speech for confirmation at the time of registration and recognition by utilizing a speech recording and reproducing device. CONSTITUTION:A speech recognition device 10 fetches the registered speech signal of a user through a BPF 3 and allows the speech recording and reproducing device to start recording operation on detecting the start of a word. Then the recognition device 10 extracts features from the input speech data detects the end of the word while storing it in a dictionary, thereby recording words which do not exceed specific length. Then, a user voices a word to be recognized in front of a microphone 1 and then the recognition device 10 starts fetching the speech signal through the BPF3 to detect the start of the word and detects the start of a signal which does not exceed the specific word length to extract its features. Then, the features are matched with respective word features in the dictionary stored in a speech data RAM 16 to specify an address of the speech data RAM 25 and sends an instruction of reproduction to a device 20, which generates a voice to a speaker 7.

Description

【発明の詳細な説明】［産業上の利用分野］本発明は音声認識装置に関し、特に、予め特定話者の柩
語音声を登録しておく形式のものに適用して好適なもの
である。DETAILED DESCRIPTION OF THE INVENTION [Industrial Field of Application] The present invention relates to a speech recognition device, and is particularly suitable for application to a device in which a specific speaker's Japanese speech is registered in advance.

［従来の技術］人間にとって音声で入力対象装置に情報を入力できれば
便利であり、最近、音声認識装置を情報入力装置として
備える各種の装置が提案されている。例えば、電話回線
を閉結させる発呼動作を、ブツシュボタンキーを用いて
直接行なうのではなく、発呼者が音声によって相手先を
指定し、この音声を認識してその相手先の電話番号を記
憶部から取出して発呼動作するような装置が提案されて
いる。[Prior Art] It would be convenient for humans to be able to input information into an input target device by voice, and recently various devices have been proposed that include a voice recognition device as an information input device. For example, instead of directly using a button to make a call to close a telephone line, the caller uses voice to specify the destination, and the voice is recognized and the phone number of the other party is entered. A device has been proposed that performs a calling operation by taking out a call from a storage unit.

このような装置の情報入力装置として用いられる音声認
識装置として、予め、認識すべき音声（１単語以上の音
声を含め、以ｒでは単語音声と呼ぶ）を登録しておき、
入力された単語音声を登録された単１１８１音ｊｈと比
較して認識する特定話者用のものがある。As a speech recognition device used as an information input device of such a device, speech to be recognized (including speech of one or more words, hereinafter referred to as word speech) is registered in advance,
There is a system for specific speakers that recognizes input word sounds by comparing them with registered single 1181 sounds jh.

かかる１１声３２識装置においては、音声登録時に、登
録した単語音声を利用者にｉ認させることがなされ、ま
た、音声認識時に、認識した結果を確認させるようにな
されている。このような音声登録時の確認、及び認識結
果の確認は、デイスプレィ装置や液晶表示装置や発光ダ
イオード等による視覚的方法によって、又は、合成音声
やブザー等の聴覚的方法によって行なわれていたく特開
昭６３〜８７９９号公報、特開昭６３−１４２００号公
報、特開昭６３−１５２９４号公報、特開昭６２−２７
９３９９号公報等）。In such an 11-voice 32 recognition device, when registering a voice, the user is made to recognize the registered word voice, and when recognizing a voice, the user is made to confirm the recognition result. Confirmation at the time of voice registration and confirmation of recognition results are performed visually using a display device, liquid crystal display, light emitting diode, etc., or by auditory methods such as a synthesized voice or a buzzer. JP-A-63-8799, JP-A-63-14200, JP-A-63-15294, JP-A-62-27
9399, etc.).

第２図は、合成音声による確認構成を有する従来の一例
を示すものである。第２図において、マイクロフォン１
は、登録時又は認識時において、利用者が発生した単語
音声を捕捉して電気信号に変換するものであり、この捕
捉された音声信号は増幅回路２を介して増幅された後、
バンドパスフィルタ回路３を介して不要成分が除去され
た後、音声認識器４に与えられる。FIG. 2 shows an example of a conventional system having a confirmation configuration using synthesized speech. In Figure 2, microphone 1
The system captures the word speech generated by the user during registration or recognition and converts it into an electrical signal. After this captured speech signal is amplified via the amplifier circuit 2,
After unnecessary components are removed through the bandpass filter circuit 3, the signal is provided to the speech recognizer 4.

音声認識器４は、アナログ／デジタル変換回路４１゜、
マイクロプロセッサ４２、インストラクションＲＯＭ４
３、ワーキングＲ，ＡＭ４４、データＲＯＭ４５及びデ
ータＲＡＭ４６等を有し、音声登録処理、音声認識処理
及びこれらの確認処理を実行するものである。このよう
な処理において、音ｊＧ出力の必要が生じたときには、
音声認識器４は、アドレスバスＡ及び制御線Ｂを介して
合成音声再生器５に出力を指示する。The speech recognizer 4 includes an analog/digital conversion circuit 41°,
Microprocessor 42, instruction ROM 4
3, a working R, an AM 44, a data ROM 45, a data RAM 46, etc., and executes voice registration processing, voice recognition processing, and confirmation processing thereof. In such processing, when it becomes necessary to output sound jG,
The speech recognizer 4 instructs the synthesized speech reproducer 5 to output via the address bus A and the control line B.

合成音声再生器５は、指示されたアドレスのエリアから
メツセージ又は音素片を取出してアナログ信号に変換し
た後、増幅回路６に与える。増幅回路６は、到来する合
成りＰ信号を増幅してスピーカ７に！トえて発生させる
。The synthesized speech reproducer 5 extracts a message or a phoneme from the area of the designated address, converts it into an analog signal, and supplies it to the amplifier circuit 6. The amplifier circuit 6 amplifies the incoming composite P signal and sends it to the speaker 7! Toss and generate.

音声認識器４及び合成音声再生器５はコン１−ローラ８
によって制御される。コントローラ８は、例えば、いわ
ゆるＲ８２３２Ｃバス等のシリアルバスＤを介した図示
しない外部機器からの指令に基づいて、登録時又は認識
時の処理を実行させるように、コントロールバスＣを介
して音Ｐ認識器４を制御し、また、制御線Ｅを介して合
成音声再生器５を制御する。The speech recognizer 4 and the synthesized speech reproducer 5 are connected to the controller 1-roller 8.
controlled by The controller 8 recognizes the sound P via the control bus C so as to execute processing at the time of registration or recognition based on a command from an external device (not shown) via a serial bus D such as a so-called R8232C bus. It also controls the synthesized speech reproducer 5 via the control line E.

次に、かかる従来例における登録動作について説明する
。利用巷が外部機器の操作パネル等によって登録処理を
指示すると、外部機器から登録アドレス及び登録コマン
ドがコントローラ８に与えられる。このとき、コントロ
ーラ８は、これら登録アドレス及び登録コマンドをその
まま、正確にはデータフォーマットを代えて音声認識器
４に与える。利用者は、登録処理を指示した後、マイク
ロフォン１に向かって登録単語を発音する。Next, the registration operation in such a conventional example will be explained. When the user instructs the registration process using an operation panel or the like of an external device, a registration address and a registration command are given to the controller 8 from the external device. At this time, the controller 8 provides the registered address and the registered command to the voice recognizer 4 as they are, or more precisely, with the data format changed. After instructing the registration process, the user pronounces the registered word into the microphone 1.

音声認識部・１は、登録アドレス及び登録コマンドがグ
えられると、バンドパスフィルタ回路３から合山信号を
取込み、デジタルデータに変換１．た後、その特徴を抽
出し、登録単語の辞書を作成してデータＲＡ　Ｍ　４６
に格納する。When the registered address and registered command are received, the voice recognition unit 1 takes in the combined signal from the bandpass filter circuit 3 and converts it into digital data.1. After that, extract the characteristics, create a dictionary of registered words, and store it in the data RAM 46.
Store in.

かかる格納動作が終丁すると、音声認識器４は、登録が
正常に行なわれたことをコン１−ロールバスＣ、コント
ローラ８、シリアスバスＤを介して外部機器に報告し、
また、アドレスバスＡ及び制御線Ｂを介して合成音声再
生器５の登録完了メツセージが格納されているエリアか
らデータを出力させ、アナログ信号に変換させて出力さ
せ、スピーカ７から登録完了メツセージを発音させる。When the storage operation is completed, the voice recognizer 4 reports to the external device via the control bus C, the controller 8, and the serious bus D that the registration has been successfully performed.
In addition, data is outputted from the area where the registration completion message of the synthesized voice reproducer 5 is stored via the address bus A and the control line B, and is converted into an analog signal and output, and the registration completion message is output from the speaker 7. let

次に、このような登録音声情報を利用する音声認識処理
について説明する。Next, a speech recognition process using such registered speech information will be explained.

：７ントローラ８は、シリアルバスＤを介して外部機器
から認識コマンドを受は収ると、その通信フォーマット
を変換した後、制御線Ｃを介して音声認識器４に与える
。利用者は認識処理を指示した後、マイクロフォン１に
向かって認識を希望する音声を発生する。:7 When the controller 8 receives a recognition command from an external device via the serial bus D, it converts the communication format and then sends it to the speech recognizer 4 via the control line C. After instructing the recognition process, the user generates the voice desired to be recognized into the microphone 1.

音声認識器４は、認識コマンドが与えられると、バンド
パスフィルタ回路３を介してマイクロフォン１で捕掟し
た音声信号を取込み、その特徴を抽出し、抽出された音
声特徴とデータＲＡＭ４６に格納されている辞書の複数
の音響特徴とのマツチングを計算し、最もマツチングし
ている辞書の音声特徴にかかる音声と認識する。音声認
識器４は、その認識結果に対応した単語の音素片を格納
している合成ｌ）声Ｉｆ生器５のアドレスを順次指示す
ると共に、合成音ｐｌ再土器５に再生起動をかけてその
認識結果の単語をスピーカ７から発生させる。When a recognition command is given, the speech recognizer 4 takes in the speech signal captured by the microphone 1 via the bandpass filter circuit 3, extracts its features, and stores the extracted speech features and data in the RAM 46. The system calculates the matching with multiple acoustic features of the dictionary, and recognizes the speech as corresponding to the audio feature of the dictionary that matches the most. The speech recognizer 4 sequentially instructs the address of the synthesized voice If generator 5 which stores the phoneme fragments of the word corresponding to the recognition result, and also activates the synthesized voice PL reproduction 5 to start its reproduction. The words resulting from the recognition are generated from the speaker 7.

また、はぼ同時に、コントロールバスＣ、コントローラ
８及びシリアルバスＤを介して認識結果を外部機器に伝
送する。Also, the recognition results are transmitted to external equipment via the control bus C, controller 8, and serial bus D almost simultaneously.

［発明が解決しようとする課題］」二連の方法では、登録直後及び認識直後に合成音声に
よって確認させるようにしているが、登Ｓ、にした＊、
　％を確認したい場合として、実際上、さ八に以ＩＳの
ような場合があり、上述したタイミングでの確認では、
また合成ＲＪ’による確認では不十分な場合が生じる。[Problems to be Solved by the Invention] In the two-step method, confirmation is made using synthesized speech immediately after registration and immediately after recognition;
When you want to check the percentage, there are actually cases such as IS after Sahachi, and when checking at the timing mentioned above,
Further, confirmation by synthetic RJ' may be insufficient.

すなわち、登録単語の数が多くなってくると、利用者は
登録した全ての単語を覚えきれないことも生じ、登録し
た単語内容を確認したい場合がある。この場合において
、登録動作及び認識動作とは別個に確認することを望む
場合がある。なお、従来は、このような不都合を避ける
ように、多くの利用者は登録語−覧のメモ等を用いてお
り、メモ等を作成する分煩雑であった。That is, as the number of registered words increases, the user may not be able to memorize all the registered words, and may want to check the contents of the registered words. In this case, it may be desired to confirm the registration operation and the recognition operation separately. In the past, in order to avoid such inconveniences, many users used memos or the like in the registered word list, which was a tedious process.

音声認識装置においては、登録動作時での発音と認、識
動作時での発音とが、ＬＦ確に認識される程瓜に同一で
あることが必要となるが、たまにしか使用しない単語の
場合、認識動作時に、登録動作時においてどのように発
音したかを忘れることがあり、この場合、登録動作時で
の発音を確認したいことも生じるが、従来では、登録動
作時及び認識動作時以外は確認できず、これに応じられ
ることができない。しかも、このような途中における登
録単語ｒ１°Ｊ＠の確認をタイミング的にできるように
しても、確認音声が合成音声であって抑揚や速度等の発
音傾向が固定されているため、中１語内容は確認できて
も発音傾向の確認はすることができない。In a speech recognition device, it is necessary that the pronunciation during the registration operation, the pronunciation during recognition, and the pronunciation during the recognition operation are identical enough to be accurately recognized by LF, but in the case of words that are used only occasionally. , During the recognition operation, you may forget how you pronounced it during the registration operation, and in this case, you may want to check the pronunciation during the registration operation. I cannot confirm this and cannot respond. Moreover, even if it is possible to check the registered word r1°J@ in the middle of the process, the confirmation voice is a synthesized voice and the pronunciation trends such as intonation and speed are fixed, so the first word in middle school Although the content can be confirmed, the pronunciation tendency cannot be confirmed.

また、登録時における環境雑音と認識動作時における環
境雑音との相違も、認識結果に多大な影響を与えるが、
かかる環境等の確認は合成音響によっては行なうことが
できない。そのため、認識不可の場合、環境雑音が原因
しているか否かを判別することができなかった。このよ
うな環境雑音の確認も、認識動作時及び登録動作以外に
行ないたい場合がある。In addition, the difference between the environmental noise at the time of registration and the environmental noise at the time of recognition operation has a great influence on the recognition result.
Confirmation of such an environment cannot be performed using synthetic sound. Therefore, when recognition is not possible, it has not been possible to determine whether or not environmental noise is the cause. There are cases where it is desired to check such environmental noise at times other than the recognition operation and the registration operation.

複数の話者毎に辞書を用窓しておき、その辞ｇを交換す
ることで複数の話者が共通に音声認識器を利用すること
ができるようにし得るが、その場合において、今、音声
認識器に装填されている辞書がどの話者に係るものであ
るかを従来は認識動作を実際に行なって確認するほか方
法がなかった。By having a dictionary for each of multiple speakers and exchanging the dictionaries, it is possible for multiple speakers to commonly use a speech recognizer. In the past, the only way to confirm which speaker the dictionary loaded in the recognizer belonged to was to actually perform the recognition operation.

このような要求に対しても、認識動作及び登録動作以外
のタイミングでの確認処理を設けることで、タイミング
的には要求を満足させることができるが、（ｉｉ認の出
力音声が合成ａ゛声であるため、話者を認識できない。In response to such a request, it is possible to satisfy the request in terms of timing by providing a confirmation process at a timing other than the recognition operation and registration operation, but (ii) the output voice of recognition is a synthesized a Therefore, the speaker cannot be recognized.

また、合成音声は、装置にもよるが聞取り難く、似たよ
うな単語の場合には誤って確認することがある。Also, depending on the device, synthesized speech may be difficult to hear, and similar words may be mistakenly recognized.

本発明は、以−■−の点を考慮してなされたものであり
、第１には、登録単語の確認用音声の発音が環境雑音を
含めて登録時に入力された音声の発ｒ５と等しい音声認
識装置を提供しようとするものであり、第２には、登録
単ｉ忌の確認可能なタイミングを利用者が自由に選択で
きる音声認識装置を提供しようとするものであり、第３
に、任意のタイミングでの確認でもその確認し得る内容
を多くすることができる音声認識装置を提供しようとす
るものである。The present invention has been made in consideration of the following points. First, the pronunciation of the confirmation voice of the registered word is equal to the pronunciation r5 of the voice input at the time of registration, including environmental noise. The second purpose is to provide a voice recognition device that allows the user to freely select the timing at which a registered single death can be confirmed.
Another object of the present invention is to provide a speech recognition device that can increase the number of contents that can be confirmed even when confirmation is performed at an arbitrary timing.

［課題を解決するためのｆ段］第１乃至第３の本発明共に、入力されて辞訂に登録され
た単語内容のうち所定のものを、少なくとも中詰登録直
後又は単語認識直後に、単語内容出力手段が発音手段に
４えて発音させて登録単語又は認識単語を確認させる音
声認識装置に関するものである。[Step f for solving the problem] In both the first to third aspects of the present invention, predetermined word contents inputted and registered in the dictionary are processed into words at least immediately after middle registration or immediately after word recognition. The present invention relates to a speech recognition device in which a content output means causes a pronunciation means to pronounce a registered word or a recognized word.

第１の本発明は、単語内容出力手段を、単語登録のため
に入力された単語音声をそのまま記録すると共に、確認
用の再生指令に基づいて、記録した単語音声を発音手段
に出力する音声記録再生装置で構成したことを特徴とす
るものである。The first aspect of the present invention provides a voice recording method in which the word content output means records the word sounds input for word registration as they are, and outputs the recorded word sounds to the pronunciation means based on a playback command for confirmation. It is characterized by being configured with a playback device.

また、第２の本発明は、入力操作に応じて単語内容出力
手段に所定単語の出力指令を与える任意時点確認指令手
段を設けたことを特徴とするものである。Further, the second invention is characterized in that an arbitrary point confirmation command means is provided for giving a command to output a predetermined word to the word content output means in response to an input operation.

第３の本発明は、入力操作に応じて単語内容出力手段に
出力指令をり−える任意時点確認指令手段を設けると共
に、単語内容出力手段を、単語登録のために入力された
単語音声をそのまま記録すると共に、確認用の再生指令
に基づいて、記録した単語音声を発音手段に出力する音
声記録再生装置で構成したことを特徴とするものである
。The third aspect of the present invention is to provide an arbitrary time confirmation command means for changing the output command to the word content output means in accordance with an input operation, and to output the word sound inputted for word registration to the word content output means as it is. The present invention is characterized in that it is constituted by a voice recording and reproducing device that records the recorded word voice and also outputs the recorded word voice to a pronunciation means based on a reproduction command for confirmation.

［作用］第１の本発明は、単語登録時に、音声記録再生装置によ
って登録のために入力された学語音声をそのまま記録し
、単語登録直後又は単語認識直後に、音声記録再生装置
から登録された単語音声又は認識された単語音声を発音
手段に与えてその単語ｒ１声を確認のために発音させる
ようにしたものである。ここで、発音された単語音声は
、単語内容だけでなく、抑揚や登録時の環境雑音等の情
報をも含んだものとなっている。[Operation] The first aspect of the present invention is to record the academic language sounds input for registration by the audio recording and reproducing device as they are at the time of word registration, and to register the academic language sounds from the audio recording and reproducing device immediately after registering the word or immediately after recognizing the word. The voice of the word or the voice of the recognized word is given to the pronunciation means, and the word r1 voice is pronounced for confirmation. Here, the pronounced word sound includes not only the word content but also information such as intonation and environmental noise at the time of registration.

第２の本発明は、任意時点確認指令手段を設け、この妊
意時点確認指令手段が確認発音の起動を指示する入力操
作を検出したとき、単語内容出力手段から発音手段に対
して指示された登録単語を出力し、その単語音声を発音
させるようにしたものである。従って、利用上゛は、単
語登録直後又は単語認識直後だけでなく、任意の時点で
登録された単語−Ｒ，声を発音させることができる。The second aspect of the present invention provides an arbitrary time point confirmation command means, and when the pregnancy time point confirmation command means detects an input operation instructing activation of confirmation pronunciation, the word content output means instructs the pronunciation means. The registered words are output and the sounds of the words are pronounced. Therefore, in use, the registered word -R and voice can be pronounced not only immediately after word registration or immediately after word recognition, but also at any time.

第３の本発明は、任意の時点で発音される単語音声を、
音声記録再生装置を用いることで登録された単語音声そ
のものとするものである。The third invention provides word sounds pronounced at any time,
This is the word sound itself that was registered using a sound recording and reproducing device.

［実施例］以Ｆ、本発明の一実施例を図面を参照しながら詳述する
。[Example] Hereinafter, an example of the present invention will be described in detail with reference to the drawings.

ここで、第１図はこの実施例の構成を示すブロック図、
第３図乃至第５図はそれぞれ、この実施例の登録処理、
認識処理及び任意時点での確認処理の動作手順を示すフ
ローチャートである。Here, FIG. 1 is a block diagram showing the configuration of this embodiment,
3 to 5 respectively show the registration process of this embodiment,
It is a flowchart showing the operation procedure of recognition processing and confirmation processing at an arbitrary point in time.

丈旌凹Ω扁滅第２図との対応部分に同一符号を付して示す第１図にお
いて、マイクロフォン１からの音声信号は、増幅回路２
及びバンドパスフィルタ回路３を介して音声認識器１０
に蜂えられる。この音響認識器１０は、アナログ／デジ
タル変換回路１１、マイクロプロセッサ１２、インスト
ラクションＲＯＭ１３、ワーキングＲＡＭ１４、データ
ＲＯＭ１５及びデータＲ，ＡＭ１６等を有し、音用登録
処理、音声認識処理及びこれらの確認処理を実行するも
のである。この音声認識器１０は、音声記録再生装置２
０にアドレスバスＡ、制御線Ｂ、単語始端検出信号線Ｆ
及び単語終端検出信号線Ｇを介して接続されている。In FIG. 1, in which parts corresponding to those in FIG.
and a speech recognizer 10 via a bandpass filter circuit 3
I can't believe it. This acoustic recognizer 10 has an analog/digital conversion circuit 11, a microprocessor 12, an instruction ROM 13, a working RAM 14, a data ROM 15, and data R, AM 16, etc., and performs sound registration processing, speech recognition processing, and confirmation processing thereof. It is something to be carried out. This speech recognizer 10 includes a speech recording and reproducing device 2
0, address bus A, control line B, word start detection signal line F
and is connected via a word end detection signal line G.

音声記録再生装置２０は、アナログ／デジタル変換回路
２１、マイクロプロセッサ２２、インストラクションＲ
ＯＭ２３、ワーキングＲ，ＡＭ２４、音声データＲＡＭ
２５及びデジタル／アナログ変換回路２６等を有する。The audio recording and reproducing device 20 includes an analog/digital conversion circuit 21, a microprocessor 22, and an instruction R.
OM23, working R, AM24, audio data RAM
25 and a digital/analog conversion circuit 26.

この音声記録再生装置２０は、マイクロプロセッサ２２
による制御のＦ、バンドパスフィルタ回路３から与えら
れる音声信号をアナ１コグ／デジタル変換回路２１によ
ってデジタルデータ（ａ゛声データ）に変換して音声デ
ータＲＡＭ２５に記録するものであり、このときに記録
開始指令として単語始端検出信号線Ｆか八ｊｙ−えられ
る単語始端検出信号が用いられ、記録路ｒ指令として単
語終端検出信号線Ｇから与えられる単語終端検出信号が
用いられるようになされている。This audio recording/playback device 20 includes a microprocessor 22
In the control F, the audio signal given from the bandpass filter circuit 3 is converted into digital data (a゛ voice data) by the analog/digital conversion circuit 21 and recorded in the audio data RAM 25. A word start detection signal applied from the word start detection signal line F is used as the recording start command, and a word end detection signal applied from the word end detection signal line G is used as the recording path r command. .

この音声記録再生装置２０からの再生は、音声認識３１
０からの再生指令及び再生アドレス、又はコント１：１
−ラ３０からの制御線Ｅを介した再生指令及び再生アド
レスによってなされる。音声記録再主装置２０は、再生
アドレスが指示する音声データＲＡＭ２５のエリアから
音声データを涜出し、その後、デジタル／アナログ変換
回路２６を介してアナログ信号に変換してスピーカ駆動
用の増幅回路６に出力するようになされている。The playback from this voice recording and playback device 20 is performed by the voice recognition 31.
Playback command and playback address from 0, or control 1:1
This is done by the reproduction command and reproduction address via the control line E from the controller 30. The audio recording/reproducing device 20 extracts the audio data from the area of the audio data RAM 25 specified by the reproduction address, and then converts it into an analog signal via the digital/analog conversion circuit 26 and sends it to the amplifier circuit 6 for driving the speaker. It is designed to output.

この実施例におけるコントローラ３０は、外部機器と音
声認識器１０とのデータ授受を制御するに加えて、外部
機器から任意時点の確認指令が与えられたとき、音声記
録再生装置２０に対して再生コマンドと再生アドレスを
与える動作をもするものである。なお、記録動作及び再
生動作時に音声記録再生装置２０が使用するクロック信
号は、コントローラ３０から制御線Ｅを介して与えられ
るようになされている。The controller 30 in this embodiment not only controls data exchange between an external device and the voice recognizer 10, but also commands a playback command to the voice recording and playback device 20 when a confirmation command is given from the external device at any time. It also functions to give a playback address. Note that a clock signal used by the audio recording and reproducing apparatus 20 during recording and reproducing operations is supplied from the controller 30 via a control line E.

Ｘ旌ｍ旦Ｒ熟］次に、実施例における音声信号の登録処理について第３
図を参照しながら説明する。Next, the third section regarding the audio signal registration process in the embodiment will be explained.
This will be explained with reference to the figures.

コントローラ３０は、外部機器から登録アドレス及び登
録コマンドを受領すると、音声記録再生装置２０に対し
てその音声データＲＡＭ２５の記録開始アドレスを指定
すると共に、音声認識器１０に対して辞書のアドレス及
び登録開始コマンドを送出する（ステップ１００．１０
１）。Upon receiving the registration address and registration command from the external device, the controller 30 specifies the recording start address of the audio data RAM 25 to the audio recording/reproducing device 20, and also specifies the dictionary address and registration start to the speech recognizer 10. Send command (step 100.10)
1).

利用者は、このような外部機器からのコマンドの出力操
作を実行した後、マイクロフォン１に向かって登録すべ
き音声を発生する。After the user executes the command output operation from the external device, the user generates the voice to be registered into the microphone 1.

音声認識器１０は、かかる登録コマンド及び辞書のアド
レスを受信すると、バンドパスフィルタ回路３を介して
利用者が発生した音声信号を収り込む（ステップ１１０
．１１１）。このような取込みを開始すると、有音部分
になったか否か、すなわち、単語始端を検出したか否か
を判断し、検出することを待受ける（ステップ１１２）
。やがて、単語始端を検出すると、音声認識器１ｏは、
音声記録再生袋；〃２０に対して単語始端検出信号線Ｆ
を介して検出信号を送出し、記録動作を起動させる（ス
テップ１１４）。When the speech recognizer 10 receives the registration command and dictionary address, it filters the speech signal generated by the user via the bandpass filter circuit 3 (step 110).
．． 111). When such import is started, it is determined whether a voiced part has been reached, that is, whether the beginning of a word has been detected, and the detection is awaited (step 112).
. Eventually, when the beginning of a word is detected, the speech recognizer 1o
Voice recording and playback bag: Word start detection signal line F for 20
A detection signal is sent out via the controller to start a recording operation (step 114).

このとき、音声記録再生装置２０は、コントローラ３０
から指示された音声データＲＡＭ２５に、バンドパスフ
ィルタ回路３を介して与えられ、内部のアナログ／デジ
タル変換回路２１によってデジタルデータに変換された
音声データを格納させ始める。At this time, the audio recording and reproducing device 20 controls the controller 30
The audio data RAM 25 instructed by the audio data RAM 25 starts storing the audio data that is given via the bandpass filter circuit 3 and converted into digital data by the internal analog/digital conversion circuit 21.

音声認識器１０は、入力された音声データについて特徴
抽出を行なって辞書（データＲＡ　Ｍ　１６　）に格納
しつつ、その音声データが無音状態、すなわち、単語終
端になったか否かを検出し、単語終端を検出するまで、
取込み、特徴抽出動作を繰返す（ステップ］、１４．１
１５）。このようにして、単語終端を検出すると、単語
長は所定の長さより短いか否かを判別し、長い場合には
、エラー処理を実行する（ステップ１１６．１１７）。The speech recognizer 10 performs feature extraction on the input speech data, stores it in a dictionary (data RAM 16 ), detects whether the speech data is in a silent state, that is, at the end of a word, and identifies the word. until the end is detected.
Repeat import and feature extraction operations (step), 14.1
15). When the end of a word is detected in this way, it is determined whether the word length is shorter than a predetermined length or not, and if it is longer, error processing is executed (steps 116 and 117).

これに対して、単語長が所定の長さより短い場合には、
゛Ｒ声認識器１０は、単語終端検出信号線Ｇを介して、
音声記録再生袋？ｆＹ　２０に記録動作の終了を指示す
る（ステップ１１８）。On the other hand, if the word length is shorter than the predetermined length,
The R voice recognizer 10 via the word end detection signal line G,
Audio recording reproduction bag? fY 20 is instructed to end the recording operation (step 118).

これにより、音声記録再生装置２０は、バンドパスフィ
ルタ回路３からの音声信号の記録動作を終了する。Thereby, the audio recording/reproducing device 20 ends the recording operation of the audio signal from the bandpass filter circuit 3.

上述の記録動作の終Ｔを音声記録再生装置２０に指示し
た後には、音声認識器１０は、登録完了信号をコン［・
ローラ３０に出力して登録動作を終ｒする（ステップ１
１９）。After instructing the audio recording/reproducing device 20 to end the recording operation described above, the audio recognizer 10 sends a registration completion signal to the controller.
Output to the roller 30 and finish the registration operation (Step 1
19).

コンＩ・ローラ３０は、音声認識器１０に対して登録動
作を指示した後は、音声認識器１０から登録完了信号を
受信することを待受け、受信すると、音声記録再生装置
２０に対して、今記録されたばかりのエリアを再生アド
レスによって指定し、がっ、再生コマンドを与えて再生
動作させ、その後、外部機２羽に対して登録完了を報告
する（ステップ１０２〜１０４）。After instructing the voice recognizer 10 to perform a registration operation, the controller 30 waits to receive a registration completion signal from the voice recognizer 10, and upon receiving the registration completion signal, the controller 30 instructs the voice recording/reproducing device 20 to perform the registration operation. The area that has just been recorded is designated by the playback address, a playback command is given to start the playback, and then the completion of registration is reported to the two external devices (steps 102 to 104).

音声記録再生装置２０は、かかるコマンドによって登録
したばかりの音・声データを出力し、スピーカ７から発
音させる。The audio recording/reproducing device 20 outputs the just-registered audio/voice data according to the command, and causes the speaker 7 to output the sound/voice data.

夫施伍ム認Ａ処ユ次に、認識処理について第４図を参照しながら説明する
。Next, the recognition process will be explained with reference to FIG. 4.

利用者は、音声を入力して音響認識器１０に認識をさせ
る場合には、外部機器を用いて認識処理を指示した後、
マイクロフォン１に向がって認識させるべき単語を発音
する。When the user inputs voice and causes the acoustic recognizer 10 to recognize it, the user instructs the recognition process using an external device, and then
Pronounce the word to be recognized into the microphone 1.

このとき、外部機器からは認識コマンドが＝７ンｌ−ロ
ーラ３０に向かって出力され、コントローラ３０は、認
識コマンドの通信フォーマットを変換して音声認識器１
０に出力する。At this time, a recognition command is output from the external device to the =7nl-roller 30, and the controller 30 converts the communication format of the recognition command to the voice recognizer 1.
Output to 0.

音声認識器１０は、このコマンドを受信すると、バンド
パスフィルタ回路３を介したマイクロフォン１が捕捉し
た置き信号の取込みを開始し、単語始端を検出すること
を待受ける（ステップ１３０〜１３２）。このようにし
て単語始端を検出すると、その後、姑語終端を検出する
まで、入力された音声信号をワーキングＲＡＭ１．６に
格納して特徴抽出処理を行ないつつ、単語終端を検出す
ると、入力された単語長が所定の長さより短いか否かを
判別する（ステップ１３３〜１３５）。この判別の結果
、長いと判断すると、エラー処理を実行する（ステップ
１３６）。When the speech recognizer 10 receives this command, it starts capturing the position signal captured by the microphone 1 via the bandpass filter circuit 3, and waits to detect the start of a word (steps 130 to 132). When the beginning of a word is detected in this way, the input audio signal is stored in the working RAM 1.6 and feature extraction processing is performed until the end of the word is detected. It is determined whether the word length is shorter than a predetermined length (steps 133 to 135). As a result of this determination, if it is determined that the length is long, error handling is executed (step 136).

このようにして、所定長辺トの単語が入力されてその特
徴を抽出すると、音声データＲＡＭ　Ｌ　６に格納され
ている辞書の各単語特徴とのマツチングを行ない、いず
れかの単語であると認識し、その認識単語にかかる音声
記録再生装置２０のａ声データＲ，ＡＭ２５のアドレス
を指定し、制御線Ａを介して音声記録再生装置２０に対
して再生を指示するくステ・ツブ１３７．１３８）。In this way, when a word with a predetermined long side is input and its features are extracted, it is matched with the features of each word in the dictionary stored in the audio data RAM L6 and recognized as one of the words. 137.138, which specifies the address of the voice data R and AM25 of the voice recording and reproducing device 20 related to the recognized word, and instructs the voice recording and reproducing device 20 to reproduce it via the control line A. ).

これにより、音声記録再生装置２０は、再生動作し、指
示された＃ｊ自信号を増幅回路６を介してスピーカ７に
与えて認識された単語の音声を発音させる。As a result, the audio recording/reproducing device 20 performs a reproducing operation, and supplies the instructed #j self signal to the speaker 7 via the amplifier circuit 6 to produce the voice of the recognized word.

音７１認識器１．０は、音声記録再生装置２０に対して
認識音声の再生を指示した後、コントローラ３０を介し
て外部機器に認識結果を報告し、一連の認識処理を終了
する（ステップ１３９）。The sound 71 recognizer 1.0 instructs the voice recording and reproducing device 20 to reproduce the recognized voice, then reports the recognition result to the external device via the controller 30, and ends the series of recognition processes (step 139). ).

このようにして、認識結果を利用者は、今夕、にされた
時の発音と同一の発音による音声で確認することができ
る。In this way, the user can confirm the recognition result using the same pronunciation that was pronounced this evening.

火胸但Ω再煎錯熟」次に、登録や認識動作に関係なく、登録した音声内容を
任意時点で利用者に確認させる確認処理について第５図
を多照しながら説明する。Next, a confirmation process that allows the user to confirm the registered voice content at any time, regardless of the registration or recognition operation, will be explained with reference to FIG. 5.

この場合、利用者は、外部機器に対して任意時点での確
認処理を指示すると共に、例えば、確認すべき単語の単
語番号を入力する。このとき、外部機器は１モ意時点確
認コマンド及びその単語番号に応じた音声記録再生装Ｍ
ｌ　２０における音声データＲＡＭ２５のエリアのアド
レスをコントローラ３０に出力する。In this case, the user instructs the external device to perform confirmation processing at an arbitrary time, and also inputs, for example, the word number of the word to be confirmed. At this time, the external device sends the 1st Moment Point Confirmation command and the voice recording/playback device M corresponding to the word number.
The address of the area of the audio data RAM 25 in 20 is output to the controller 30.

コントローラ３０は、任意時点確認コマンドを受信する
と、音声記録再生装置２０に対して再生すべきエリアの
アドレスを出力し、その後、再生コマンドを出力して一
連の処理を終了する（ステップ１５０〜１５２）。When the controller 30 receives the arbitrary point confirmation command, it outputs the address of the area to be reproduced to the audio recording and reproducing device 20, and then outputs a reproduction command and ends the series of processing (steps 150 to 152). .

これにより、音声記録再生装置２０は、指定された音声
データＲＡＭ２５のエリアから音声データを出力し、デ
ジタル／アナログ変換回路２６を介してアナログ信号に
変換して増幅回路６に午え、スピーカ７から確認用の音
声を発音させる。As a result, the audio recording and reproducing device 20 outputs audio data from the designated area of the audio data RAM 25, converts it into an analog signal via the digital/analog conversion circuit 26, sends it to the amplifier circuit 6, and outputs the audio data from the speaker 7. Pronounce a confirmation voice.

これにより、利用者は、指示した単語内容をその単語の
登録時と同じ発音によって確認することができる。This allows the user to confirm the content of the specified word using the same pronunciation as when the word was registered.

夫立赳百カ盟従って、上述の実施例によれば、登録時及び認識時以外
に一登録内容を確認することができる。Therefore, according to the above-described embodiment, it is possible to check the registered content at times other than the time of registration and recognition.

この場合において、登録時に音声信号を記録し、任意時
点の確認時にその音声信号を発音させるようにしたので
、合成音声による確認とは異なり、登録内容自体の確認
だけではなく、登録話者が誰であるかを、また、登録時
の背角品質、例えば、環境雑音や発音の抑揚や発音速度
等がいかなる状況であるかを確認、することができる。In this case, an audio signal is recorded at the time of registration, and the audio signal is emitted when checking at any point in time, so unlike confirmation using synthesized speech, it is possible to not only confirm the registered content itself, but also to confirm who the registered speaker is. In addition, it is possible to check the background quality at the time of registration, such as environmental noise, intonation of pronunciation, pronunciation speed, etc.

その結果、認識時に正確に意図しな単Ａ！ｉを発音させ
ることかて′き、ｊ２識率をｒＦｑ−Ｌさぜる。二とか
て°゛きる。As a result, when I recognized it, I got an A that was not exactly what I intended! After trying to pronounce the i, the j2 recognition rate is rFq-L. I can do two things.

また、登録時及び認識時においても、合成音声による確
認ではなく、登録された発音自体を再生して確認させる
ようにしたので、確認がオペレータにとって確実になさ
れる。すなわち、確認精度が向上する。Moreover, at the time of registration and recognition, the registered pronunciation itself is reproduced for confirmation, rather than confirmation using synthesized speech, so that the operator can be sure of confirmation. In other words, confirmation accuracy is improved.

なお、上述の実施例においては、半導体メモリを記録媒
体に使用した音声記録再生装置２０を示したが、サーチ
動作の速い他の記録媒体、例えば、フロッピーディスク
や光ディスクを使用するようなものであっても良い。In the above embodiment, the audio recording and reproducing apparatus 20 is shown using a semiconductor memory as a recording medium, but it is also possible to use other recording media with fast search operations, such as a floppy disk or an optical disk. It's okay.

また、」ユ述においては、本発明による音声認識装置を
使用した例として電話機に対する入力装置を示したが、
他の用途に使用するものであっても良い。In addition, in the description above, an input device for a telephone was shown as an example of using the voice recognition device according to the present invention.
It may also be used for other purposes.

さらに、上述においては、音声記録再生装置２０に対す
るクロック信号を記録時及び再生時で同じ周期のものを
与えて登録音声と同一発音のものを出力させるようにし
たものを示したが、再生時のクロック周期を記録時のも
のと相違させることによって再生音声の発音を任意に可
変できるようにしても良い。Furthermore, in the above description, a clock signal with the same cycle is given to the audio recording and reproducing device 20 during recording and reproducing, so that the same pronunciation as the registered audio is outputted. By making the clock cycle different from that at the time of recording, the pronunciation of the reproduced sound may be arbitrarily varied.

［発明の効果］以、Ｉｘのように、第１の本発明によれば、登録時及び
認識時における確認用の再生音声を音声記録再生装置を
利用して登録音声とすることにより、合成音声による確
認とは異なり、登録内容自体の確認だけではなく、登録
語基が誰であるかを、また、登録時の音声品質、例えば
、環境難行や発音の抑揚や発音速度等がいかなる状況で
あるかを確認することができ、確認精度を向上させるこ
とが。[Effects of the Invention] Hereinafter, as in Ix, according to the first aspect of the present invention, the reproduced voice for confirmation at the time of registration and recognition is made into a registered voice using a voice recording and reproducing device, thereby generating a synthesized voice. Unlike the confirmation by It is possible to check whether there is a problem, and the confirmation accuracy can be improved.

できて認識時に反映させることができ、認識率を向上さ
せることができる。This can be reflected at the time of recognition, and the recognition rate can be improved.

また、第２の本発明によれば、登録時及び認識時以外に
、再生コマンドによって登録された単語を発音させるよ
うにしたので、登録時及び認識時以外にオペレータが必
要に応じて登録内容を確認することができる。そのため
、登録内容をオペレータが−々メモ用紙等に書き付けて
置く必要がなくなる。Furthermore, according to the second aspect of the present invention, since the registered words are pronounced by the playback command at times other than the time of registration and recognition, the operator can input the registered contents as necessary other than the time of registration and recognition. It can be confirmed. Therefore, there is no need for the operator to write down the registered contents on memo paper or the like.

さらに、第３の本発明によれば、任意時点の確認におい
ても、合成音声による確認ではなく、登録された発音自
体を再生して確認させるようにしなので、上述の第１及
び第２の本発明による効果を奏すると共に、より使い易
い装置を実現することができる。Furthermore, according to the third aspect of the present invention, even when checking at any point in time, the registered pronunciation itself is played back for confirmation, instead of checking using a synthesized voice. It is possible to realize a device that is easier to use and has the following effects.

[Brief explanation of drawings]

第１図は本発明による音声認識装置の一実施例を示すブ
ロック図、第２図は従来装置を示すブロック図、第３図
乃至第５図はそれぞれ上記実施例の登録処理、認識処理
及び任意時点の確認処理の動作手順を示すフ１コーチヤ
ードである。１・・・マイクロフォン、７・・・スピーカ、１０・・
・音声認識器、２０・・・音声記録再生装置、３０・・
・コン１〜ローラ。第図従来装置の構成を示す７゛助り図第２図FIG. 1 is a block diagram showing an embodiment of the speech recognition device according to the present invention, FIG. 2 is a block diagram showing a conventional device, and FIGS. 3 to 5 show registration processing, recognition processing, and optional processing of the above embodiment, respectively. This is a first coach yard showing the operating procedure of time confirmation processing. 1...Microphone, 7...Speaker, 10...
・Speech recognizer, 20...Speech recording/playback device, 30...
・Con1~Lola. Fig. 7 Helping diagram showing the configuration of the conventional device Fig. 2

Claims

[Claims]

(1) At least immediately after word registration or word recognition, the word content output means gives a predetermined word content that has been input and registered in the dictionary to the pronunciation means to make it pronounce, thereby confirming the registered word or recognized word. In the speech recognition device, the word content output means records the word sound input for word registration as it is, and outputs the recorded word sound to the pronunciation means based on a playback command for confirmation. A speech recognition device comprising a recording/playback device.

(2) At least immediately after word registration or word recognition, the word content output means gives a predetermined word content that has been input and registered in the dictionary to the pronunciation means to make it pronounce, thereby confirming the registered word or recognized word. What is claimed is: 1. A speech recognition device comprising: arbitrary time confirmation command means for giving a command to output a predetermined word to the word content output means in response to an input operation.

(3) The word content output means records the word sounds input for word registration as they are, and the sound production means outputs the designated word sounds from among the recorded word sounds based on a confirmation output command. 3. The speech recognition device according to claim 2, wherein the speech recognition device comprises a speech recording and reproducing device that outputs a speech signal to a user.