JPS61292199A

JPS61292199A - Voice recognition equipment

Info

Publication number: JPS61292199A
Application number: JP60132065A
Authority: JP
Inventors: 雅彦林; 藤井　雄治
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1985-06-19
Filing date: 1985-06-19
Publication date: 1986-12-22

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（産業上の利用分野）本発明は、ノイズ情報を用いて音声区間の検出及び認識
処理を行なう音声認識装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to a speech recognition device that performs speech segment detection and recognition processing using noise information.

（従来の技術）従来、この種の音声認識装置は音声が発声される前にあ
らかじめ話者周囲環境ノイズあるいは入力条件に起因す
るノイズを測定し、該ノイズ測定の結果をノイズ情報と
してそのノイズ情報を用いて音声の区間検出及び認識を
行なっている。第５図は従来の音声認識装置の構成を示
すブロック図である。同図において、ｌは入力データを
分析する音声分析部、２は音声分析部１からの分析デー
タの入力元を切替えるスイッチ、３はスイッチ２を介し
て入力される分析データからノイズ情報を作成するノイ
ズ情報作成部、４はノイズ情報作成部３で作成されたノ
イズ情報を記憶するノイズ情報記憶部、５はスイッチ２
を介して入力される分析データを記憶する分析データ記
憶部、６はノイズ情報記憶部４に格納されたノイズ情報
を用いてレベル閾値を設定し、分析データ記憶部５の分
析データとレベル閾値とを比較して音声区間を検出する
音声区間検出部、７は分析データと音声区間検出部６か
らの音声区間情報により入力音声の認識処理を行なう認
識処理部、８は装置全体を制御するコントローラである
。(Prior Art) Conventionally, this type of speech recognition device measures noise in the environment surrounding the speaker or noise caused by input conditions before speech is uttered, and uses the results of the noise measurement as noise information. is used to detect and recognize speech sections. FIG. 5 is a block diagram showing the configuration of a conventional speech recognition device. In the figure, 1 is a voice analysis unit that analyzes input data, 2 is a switch that switches the input source of analysis data from voice analysis unit 1, and 3 is a unit that creates noise information from the analysis data that is input via switch 2. 4 is a noise information storage unit that stores the noise information created by the noise information creation unit 3; 5 is a switch 2;
An analysis data storage unit 6 stores the analysis data inputted through the noise information storage unit 4, and sets a level threshold value using the noise information stored in the noise information storage unit 4. 7 is a recognition processing section that performs recognition processing of the input speech using the analysis data and the speech section information from the speech section detection section 6, and 8 is a controller that controls the entire device. be.

次に動作を説明する。まず、最初に音声が発轡される前
にノイズ測定を行なう。コントローラ８はスイッチ２を
ノイズ情報作成部３側に切替えて音声分析部１及びノイ
ズ情報作成部３を起動する。これにより入力データは音
声分析部１に入力されて、ここで入力データのパワー情
報及び周波数情報を抽出する分析処理が一定時間毎（フ
レーム単位毎）に行なわれ、この分析結果よりノイズ情
報作成部３においてノイズ情報を作成し、ノイズ情報記
憶部４に格納する。該ノイズ測定が完了するとコントロ
ーラ８は音声分析部ｌ及びノイズ情報作成部３を停止ト
させ、次に音声区間検出及び認識動作を行なう。この場
合、コントローラ８はスイッチ２を音声区間検出部６側
に切替えて音声分析部ｌ及び音声区間検出部６を起動す
る。これにより入力データ（ｆ前記ノイズ測定と同様音
声分析部ｌにより分析され、分析結果は音声分析データ
として分析データ記憶部５及び音声区間検出部６に入力
される。音声区間検出部６は音、声分析結果とノイズ情
報記憶部４に格納され、あらかじめノイズ測定により得
られた前記ノイズ情報とを用いて入力された音声レベル
の検出を行なうためのレベル閾値を設定し、該レベル閾
値より大の区間の最初のフレームを始端、最後のフレー
ムを終端として、この始端から終端までの区間を音声区
間と決定する。次にコントローラ８は認識処理部７を起
動させる。認識処理部７は音声分析データ及び音声区間
検出部５から出力される音声区間情報、ノイズ情報をも
とに一般的に知られているパターンマツチングによる認
識処理等を行ない、認識結果を出力する。第６図は入力
データに対するノイズ測定及び音声区間検出の関係を示
したものである。Next, the operation will be explained. First, noise measurement is performed before the first voice is uttered. The controller 8 switches the switch 2 to the noise information creation section 3 side and starts the voice analysis section 1 and the noise information creation section 3. As a result, the input data is input to the audio analysis section 1, where an analysis process for extracting power information and frequency information of the input data is performed at regular intervals (for each frame), and based on the analysis results, the noise information creation section 3, noise information is created and stored in the noise information storage section 4. When the noise measurement is completed, the controller 8 stops the speech analysis section 1 and the noise information creation section 3, and then performs speech section detection and recognition operations. In this case, the controller 8 switches the switch 2 to the voice section detection section 6 side and starts the voice analysis section 1 and the voice section detection section 6. As a result, the input data (f) is analyzed by the voice analysis section l in the same manner as the noise measurement described above, and the analysis result is inputted as voice analysis data to the analysis data storage section 5 and the voice section detection section 6.The voice section detection section 6 detects the sound, A level threshold for detecting the input voice level is set using the voice analysis result and the noise information stored in the noise information storage unit 4 and obtained by noise measurement in advance, and a level threshold for detecting the input voice level is set. The first frame of the section is the starting point and the last frame is the ending point, and the section from this starting point to the ending point is determined to be the speech section.Next, the controller 8 activates the recognition processing section 7.The recognition processing section 7 receives the speech analysis data. Based on the speech section information and noise information outputted from the speech section detection section 5, recognition processing by generally known pattern matching is performed, and the recognition result is output. This figure shows the relationship between noise measurement and voice section detection.

（発明が解決しようとする問題点）しかしながら、上記構成の装置では電話機や指向性の弱
いマイクロフォン等で音声が入力される場合の話者の周
囲雑音の変化や入力されてから音声認識装置まで伝送さ
れる間に混入するノイズによって発生するノンパルス性
のノイズをノイズ測定において偶然に拾うと、そのため
ノイズ情報が特異なものになり実際の音声が存在する近
辺のノイズの状態と該ノイズ測定で得たノイズ情報が大
きく異なって正確な音声区間を切り出すことができず誤
認識を引き起すという問題点があった。(Problems to be Solved by the Invention) However, in the device with the above configuration, when the voice is input through a telephone or a microphone with weak directivity, there are changes in the surrounding noise of the speaker, and the transmission from the input to the voice recognition device is difficult. If non-pulse noise generated by noise mixed in during noise measurement is accidentally picked up during noise measurement, the noise information will be unique and it will be difficult to compare the state of the noise in the vicinity of the actual voice and the noise obtained by the noise measurement. There was a problem in that the noise information differed greatly, making it impossible to extract accurate speech sections and causing erroneous recognition.

本発明は、以上述べた特異なノイズ情報で音声区間検出
を誤るという問題点を除去し、より正確な音声区間を切
り出し認識率の優れた装置を提供することを目的とする
。SUMMARY OF THE INVENTION An object of the present invention is to eliminate the above-mentioned problem of erroneous voice section detection due to unique noise information, and to provide an apparatus that cuts out a more accurate voice section and has an excellent recognition rate.

（問題点を解決するための手段）本発明は前記問題点を解決するために、入力データを音
声分析する分析手段と、前記分析手段の分析結果を記憶
する分析結果記憶手段と、前記分析結果記憶手段に格納
された分析結果の任意の開始フレームと任意の終了フレ
ームとの間の測定区間のノイズを測定してノイズ情報を
作成するノイズ情報作成手段と、前記ノイズ情報作成手
段からのノイズ情報を記憶するノイズ情報記憶手段と、
前記ノイズ情報記憶手段からのノイズ情報を用いてレベ
ル閾値を設定し、該設定したときのタイミングで前記分
析結果記憶手段の内容と該レベル閾値とを比較すること
により音声区間を検出する検出手段と、前記分析結果記
憶手段及び検出手段の出力信号に基づいて音声認識を行
なう認識手段と、前記検出手段が、音声区間の始端を検
出するまで、前記ノイズ情報作成手段における開始フレ
ーム及び終了フレームを所定量づつ増加させてノイズ情
報を更新させ、更新されたノイズ情報に基づいて音声区
間を検出させる制御手段とから構成される音声認識装置
である。(Means for Solving the Problems) In order to solve the above problems, the present invention provides an analysis means for speech-analyzing input data, an analysis result storage means for storing the analysis results of the analysis means, and an analysis result storage means for storing the analysis results of the analysis means. noise information creation means for creating noise information by measuring noise in a measurement section between an arbitrary start frame and an arbitrary end frame of analysis results stored in a storage means; and noise information from the noise information creation means. noise information storage means for storing;
Detection means for setting a level threshold value using the noise information from the noise information storage means and detecting a speech interval by comparing the content of the analysis result storage means and the level threshold value at the timing when the level threshold value is set; , a recognition means for performing speech recognition based on the output signals of the analysis result storage means and the detection means; and a recognition means for performing speech recognition based on the output signals of the analysis result storage means and the detection means; This is a speech recognition device comprising a control means for updating noise information by increasing the noise information by a fixed amount and detecting a speech section based on the updated noise information.

好ましくは、前記制御手段は前記ノイズ情報記憶手段の
最新の内容を次の音声認識時における最初のノイズ測定
区間のノイズ情報とするように前記各手段を制御するも
のである。Preferably, the control means controls each of the means so that the latest contents of the noise information storage means are used as the noise information of the first noise measurement section during the next speech recognition.

（作　用）本発明によれば以上のように音声認識装置を構成したの
で技術的手段は次のように作用する。無音声（ノイズ測
定）区間も含む音声入力データが分析手段により音声分
析され分析結果記憶手段に格納される。制御手段はノイ
ズ情報作成手段に対し、開始フレームとして分析データ
の先頭フレームと任意の終了フレームとの間の最初のノ
イズ測定区間を指示してノイズ情報を作成させる。次に
、検出手段はノイズ情報記憶手段を介して入力されたノ
イズ情報を用いてレベル閾値を設定し、このタイミング
（終了フレーム十レベル閾値設定時間）で、分析データ
のレベルとレベル閾値とを比較して入力音声の始端から
終端までの音声区間を検出する。制御手段は検出手段で
音声区間の始端が検出されないときには、始端が検出さ
れるまで開始フレーム及び終了フレームを所定量づつ増
加させたノイズ測定区間をノイズ情報作成手段に指示し
てノイズ情報を更新させる。検出手段は更新されたノイ
ズ情報を用いてレベル閾値を設定し、このタイミング（
更新の終了フレーム十更新のレベル閾値の設定時間）で
同様に音声区間の検出を行なう。認識手段はこのように
して得られた音声区間情報により入力音声の認識を行な
う。(Function) According to the present invention, since the speech recognition device is configured as described above, the technical means functions as follows. The voice input data including the no-voice (noise measurement) section is voice-analyzed by the analysis means and stored in the analysis result storage means. The control means instructs the noise information creation means to create noise information by instructing the first noise measurement section between the first frame of the analysis data and an arbitrary end frame as the start frame. Next, the detection means sets a level threshold value using the noise information inputted through the noise information storage means, and at this timing (end frame 10 level threshold setting time), compares the level of the analysis data with the level threshold value. Then, the audio section from the beginning to the end of the input audio is detected. When the detection means does not detect the start of the voice section, the control means instructs the noise information creation means to update the noise information by instructing the noise measurement section to increase the start frame and end frame by a predetermined amount until the start is detected. . The detection means uses the updated noise information to set a level threshold, and at this timing (
Similarly, the voice section is detected at the update end frame 10 (setting time of the update level threshold). The recognition means recognizes the input speech using the speech segment information obtained in this manner.

従って、音声区間が検出されるまで最新のノイズ情報を
用いて音声区間検出を行なうので、前記従来技術の問題
点が解決できるのである。Therefore, since the voice section detection is performed using the latest noise information until the voice section is detected, the problems of the prior art described above can be solved.

（実施例）第１図は本発明の第１の実施例を示すブロック図である
。同図において、第５図と同一の参照符号は同一性のあ
る構成部分を示す。３０は第５図のノイズ情報作成部３
に相当するノイズ情報作成部で、ノイズ情報算出部３１
、スタートフレーム設定部３２及びストップフレーム設
定部３３から構成される。ノイズ情報算出部３１は分析
データ記憶部５に格納された音声分析データ（無音声区
間も含む）を入力としてノイズ情報を作成する。スター
トフレーム設定部３２及びストップフレーム設定部３３
はコントローラ８の指示によりそれぞれノイズ測定の開
始（スタート）フレーム番号及び終了（ストップ）フレ
ーム番号を格納してノイズ情報算出部３１に与える。(Embodiment) FIG. 1 is a block diagram showing a first embodiment of the present invention. In this figure, the same reference numerals as in FIG. 5 indicate the same components. 30 is the noise information creation section 3 in FIG.
A noise information creation unit corresponding to the noise information calculation unit 31
, a start frame setting section 32, and a stop frame setting section 33. The noise information calculation unit 31 receives the voice analysis data (including silent sections) stored in the analysis data storage unit 5 and creates noise information. Start frame setting section 32 and stop frame setting section 33
stores the start frame number and end frame number of noise measurement according to instructions from the controller 8, and provides them to the noise information calculation unit 31.

次に動作例を説明する。まず音声が入力される前に従来
と同様にノイズ測定を一次ノイズ測定として行なう。音
声分析部ｌから従来と同様にし分析された音声分析デー
タは、すべて一度分析データ記憶部５に格納され、コン
トローラ８は一次ノイズ測定を行なう区間として予じめ
決まった先頭フレーム番号及び終了フレーム番号をスタ
ートフレーム設定部３２及びストップフレーム設定部３
３にそれぞれ設定する。次にノイズ情報作成部３ｏが起
動され、ノイズ情報算出部３！はスタートフレーム設定
部３２及びストップフレーム設定部３３に設定されてい
るフレーム番号に対応した区間の音声分析データを分析
データ記憶部５から読み出し、従来と同様にノイズ情報
を作成してノイズ情報記憶部４に格納する。これにより
一次ノイズ測定を終了する０次にコントローラ８は音声
区間検出部６を起動し、音声区間検出を開始する。該音
声区間検出部６において前記−次ノイズ測定結果による
ノイズ情報により従来と同様の音声区間検出を行ない、
音声区間の始端が検出されない場合は更にスタ）　ｙ　
ｖ　　Ａ　ｔａ　定ｍ　（３２）、ストップフレーム設
定部（３３）にあらかじめ設定されている各々のフレー
ム番号に対してスタートフレーム、ストップフレームと
モ一定のフレーム数加算したフレーム番号をそれぞれス
タートフレーム設定部（３２）、ストップフレーム設定
部（３３）に再設定する０次にノイズ情報作成部３０を
起動して前記−次ノイズ測定と同様にして再設定された
フレーム番号に対応した区間の音声分析データを分析デ
ータ記憶部５から読み出してノイズ情報を作成しノイズ
情報記憶部４に格納させ、音声区間検出部６において音
声区間の始端検出を行なう０以上の操作を音声区間の始
端が検出されるまで繰り返すことによって、ノイズ測定
区間を移動させ最新のノイズ情報を音声区間検出部６に
与える。これにより音声区間検出部６は、ノイズ情報記
憶部４に格納されている最新のノイズ情報をもとに従来
と同様にして音声区間の検出を行なう。Next, an example of operation will be explained. First, before audio is input, noise measurement is performed as a primary noise measurement as in the conventional method. All of the voice analysis data analyzed from the voice analysis section 1 in the same manner as before is stored once in the analysis data storage section 5, and the controller 8 stores the start frame number and end frame number predetermined as the section for primary noise measurement. Start frame setting section 32 and stop frame setting section 3
Set each to 3. Next, the noise information creation unit 3o is activated, and the noise information calculation unit 3! reads the audio analysis data of the section corresponding to the frame number set in the start frame setting section 32 and the stop frame setting section 33 from the analysis data storage section 5, creates noise information as in the conventional case, and stores it in the noise information storage section. Store in 4. As a result, the primary noise measurement ends.The zero-order controller 8 activates the voice section detection section 6 and starts voice section detection. The speech section detection unit 6 performs speech section detection as in the conventional method using noise information obtained from the -order noise measurement results,
If the start of the voice section is not detected, start further) y
v A ta constant m (32), the start frame setting section adds a certain number of frames to the start frame, stop frame, and each frame number preset in the stop frame setting section (33). (32), voice analysis data of the section corresponding to the frame number reset in the stop frame setting unit (33) by activating the zero-order noise information creation unit 30 and performing the above-mentioned -order noise measurement. is read from the analysis data storage unit 5 to create noise information and stored in the noise information storage unit 4, and the voice interval detection unit 6 performs zero or more operations to detect the start of the voice interval until the start of the voice interval is detected. By repeating this, the noise measurement section is moved and the latest noise information is provided to the voice section detection section 6. Thereby, the voice section detecting section 6 detects a voice section based on the latest noise information stored in the noise information storage section 4 in the same manner as in the past.

第２図は入力データに対するノイズ測定と音声区間検出
との関係を示したもので、■の区間は［１１の区間で求
めたノイズ情報を、■の区間では［２］の区間で求めた
ノイズ情報を、・・・、■の区間では［ｎｌの区間で求
めたノイズ情報を用いて音声区間の検出を行なうタイミ
ングを表し、前記−次ノイズ測定の区間は１回目のノイ
ズ測定区間［１１に対応し、ｎ回目のノイズ測定区間［
ｎｌにより音声区間■の始端が検出されたことを示して
いる。Figure 2 shows the relationship between noise measurement for input data and voice section detection. The section ■ represents the timing for detecting a speech section using the noise information obtained in the section [nl, and the section of the -th noise measurement is the first noise measurement section [11]. Correspondingly, the nth noise measurement interval [
nl indicates that the beginning of voice section ■ has been detected.

なお音声区間検出後の動作は従来と同様のためその説明
は省略する。第１の実施例では、−次ノイズ測定と音声
区間検出する区間が連続している場合を説明したが、そ
の間が不連続であっても音声分析データをあたかも連続
しているように扱って連続の場合と同様にノイズ測定を
行なっても良い。Note that the operation after detecting the voice section is the same as the conventional method, so its explanation will be omitted. In the first embodiment, the case where the -th order noise measurement and the period for detecting the speech section are continuous was explained, but even if the period between them is discontinuous, the speech analysis data is treated as if it were continuous. Noise measurement may be performed in the same way as in the case of .

次に第２の実施例について説明する。第３図は、本発明
の第２実施例を示すブロック図であり、実施例１に加え
て前回の認識に用いたノイズ情報を測定した区間の音声
分析データの平均値を格納する平均ノイズ記憶部４１を
設けたものである。Next, a second embodiment will be described. FIG. 3 is a block diagram showing a second embodiment of the present invention, in which, in addition to the first embodiment, an average noise memory stores the average value of speech analysis data of the section in which noise information used for previous recognition was measured. A portion 41 is provided.

次に動作例を説明する。まず−回目の入力データに対す
る認識動作は第１の実施例に同様に一次ノイズ測定を行
ない、順次音声区間の始端が決定されるまでノイズ測定
を繰り返し、最終的に音声区間が検出され、これにより
認識処理を行なった後音声区間検出に使用されたノイズ
情報を測定しｌまた区間の音声分析データの平均値を求め平均ノイズ記憶
部４１に格納する。更に二回目以降の入力データに対す
る認識については一次ノイズ測定を行なわず、コントロ
ーラ８の指令により平均ノイズ記憶部４１に格納されて
いる前記音声分析データの平均値を分析データ記憶部５
内における音声区間検出区間の各フレームに対応した一
次ノイズ測定用分析データ格納エリアに格納し一次ノイ
ズ測宇を終了したものとして扱い、以下の動作は第１の
実施例１と同様にコントローラ８より与えら。Next, an example of operation will be explained. First, the recognition operation for the -th input data is performed by performing primary noise measurement in the same manner as in the first embodiment, and repeating the noise measurement sequentially until the start of the voice section is determined, and finally the voice section is detected. After performing the recognition process, the noise information used for detecting the speech section is measured, and the average value of the speech analysis data of the section is determined and stored in the average noise storage section 41. Furthermore, for the second and subsequent recognitions of input data, primary noise measurement is not performed, and the average value of the voice analysis data stored in the average noise storage section 41 is used in the analysis data storage section 5 according to a command from the controller 8.
The primary noise measurement is stored in the analysis data storage area for primary noise measurement corresponding to each frame of the voice interval detection interval within the frame, and the primary noise measurement is treated as completed. Give et al.

れ、スタートフレーム設定部３２及びストップフレーム
設定部３３で指定された区間に対してノイズ測定を音声
区間の始端が検出されるまで繰り返し、以降は第１の実
施例と同じ動作を行なう。その他の動作は第１の実施例
と同じであり、更に認識処理後最終的に音声区間検出及
び認識処理に用いたノイズ情報を測定した区間の音声分
析データの平均値を前記平均ノイズ記憶部４１に格納す
る。Then, the noise measurement is repeated for the section designated by the start frame setting section 32 and the stop frame setting section 33 until the start of the voice section is detected, and thereafter the same operation as in the first embodiment is performed. The other operations are the same as in the first embodiment, and furthermore, after the recognition process, the average value of the voice analysis data of the interval in which the noise information used for the voice interval detection and recognition process was finally measured is stored in the average noise storage unit 41. Store in.

第４図は第１の実施例における第１図と同様に入力デー
タに対するノイズ測定と音声区間検出との関係を示した
もので、■の区間は［１Ｆの区間で求めたノイズ情報を
、■の区間では【２］の区間〒求めたノイズ情報を、・
・・、■の区間で求めたノイズ情報を用いて音声区間の
検出を行なうタイミングを表し、前記音声分析データの
平均値の格納の区間は１回目のノイズ測定区間［１１に
対応し、ｎ回目のノイズ測定区間Ｉｎ’ｌにより音声区
間■の始端が検出されたことを示している。Similar to FIG. 1 in the first embodiment, FIG. 4 shows the relationship between noise measurement for input data and voice section detection. In the section [2], the obtained noise information is
. . . represents the timing of detecting a speech section using the noise information obtained in the section ■, and the section for storing the average value of the speech analysis data corresponds to the first noise measurement section [11, and the nth This indicates that the start of the voice section (2) has been detected by the noise measurement section In'l.

なお、第２の実施例において最終的に音声区間検出及び
認識処理に用いたノイズ情報を測定した区間の音声分析
≠−夕の平均値を装置内の平均ノイズ記憶部４１に格納
し次の認識に用いているが、前記音声分析データの平均
値を装置外部に出力し、次の認識において外部から受は
取るようにしても良い。In addition, in the second embodiment, the average value of the speech analysis of the section in which the noise information finally used for speech section detection and recognition processing was measured is stored in the average noise storage unit 41 in the device and used for the next recognition. However, the average value of the voice analysis data may be outputted to the outside of the device and received from the outside in the next recognition.

以上のように本実施例によればノイズ情報の測定を音声
区間の始端が検出されるまで繰り返し行ない、平均的な
ノイズ情報による音声区間検出を行なうことにより、話
者の周囲雑音の変化や音声が入力されてから装置まで伝
送される間に混入するノイズによって発生するノンパル
ス性ノイズをノイズ測定にて拾ってもより正確な音声区
間を切り出すことができ、認識率の優れた装置を提供で
きる。更に前記第２の実施例のように二回目以降の入力
データに対する認識において一次ノイズ測定を行なわな
いことにより、連続的な認識に対して、ノイズ測定のた
め時間が取られることがなく話者の発声を自然に行なわ
せることができる。As described above, according to this embodiment, noise information is repeatedly measured until the start of a speech section is detected, and the speech section is detected using the average noise information. Even if non-pulse noise generated by noise mixed in between input and transmission to the device is picked up by noise measurement, a more accurate speech section can be extracted, and a device with an excellent recognition rate can be provided. Furthermore, as in the second embodiment, by not performing primary noise measurement in the second and subsequent recognitions of input data, there is no time taken for noise measurement during continuous recognition, and the speaker's Allows vocalization to occur naturally.

（発明の効果）以上説明したように本発明によれば、音声区間を正確に
切り出すことができるので認識率の優れた音声認識装置
を提供することができる。(Effects of the Invention) As described above, according to the present invention, it is possible to accurately cut out a speech section, thereby providing a speech recognition device with an excellent recognition rate.

[Brief explanation of the drawing]

第１図は本発明の音声認識装置の第１の実施例を示すブ
ロック図、第２図は第１の実施例の動作を説明する図、
第３図は本発明の音声認識装置の第２の実施例を示すブ
ロック図、第４図は第２の実施例の動作を説明する図、
第５図は従来の音声認識装置を示すブロック図、第６図
は従来の音声区間検出動作を説明する図である。１・・・音声分析部、　　　　４・・・ノイズ情報記憶
部、５・・・分析データ記憶部、６・・・音声区間検出
部、７・・・認識処理部、　　　　８・・・コントロー
ラ、３０・・・ノイズ情報作成部、３１・・・ノイズ情
報算出部、３２・・・スタートフレーム設定部。３３・・・ストップフレーム設定部、４１・・・平均ノイズ記憶部。FIG. 1 is a block diagram showing a first embodiment of the speech recognition device of the present invention, FIG. 2 is a diagram explaining the operation of the first embodiment,
FIG. 3 is a block diagram showing a second embodiment of the speech recognition device of the present invention, FIG. 4 is a diagram explaining the operation of the second embodiment,
FIG. 5 is a block diagram showing a conventional speech recognition device, and FIG. 6 is a diagram illustrating a conventional speech section detection operation. DESCRIPTION OF SYMBOLS 1... Speech analysis section, 4... Noise information storage section, 5... Analysis data storage section, 6... Speech section detection section, 7... Recognition processing section, 8... Controller, 30 . . . Noise information creation section, 31 . . . Noise information calculation section, 32 . . . Start frame setting section. 33... Stop frame setting section, 41... Average noise storage section.

Claims

[Claims]

(1) (a) Analyzing means for speech-analyzing input data; (b)
) analysis result storage means for storing the analysis results of the analysis means; (c) measuring noise in a measurement interval between an arbitrary start frame and an arbitrary end frame of the analysis results stored in the analysis result storage means; (d) noise information storage means for storing the noise information from the noise information creation means; (e) setting a level threshold using the noise information from the noise information storage means; and a detection means for detecting a voice section by comparing the contents of the analysis result storage means and the level threshold at the set timing; (f) based on the output signals of the analysis result storage means and the detection means; (g) updating the noise information by increasing the start frame and the end frame in the noise information generating means by a predetermined amount until the detecting means detects the beginning of the voice section; A speech recognition device comprising: a control means for detecting a speech section based on the noise information obtained.

(2) The control means controls each of the means so that the latest content of the noise information storage means is used as the noise information of the first noise measurement section during the next speech recognition. The speech recognition device according to item 1.